On Lie derivatives of tensor fields

mariussophuslieDifferential geometry provides a number of ways of extending the familiar notion of the derivative of a real-valued function to enable differentiation of various types of tensor fields that `live’ on manifolds, such as scalars, contravariant vectors, one-forms (also known as covariant vectors) and mixed tensors. The problem that needs to be overcome in such cases is the fact that partial differentiation of tensors is generally not tensorial, i.e., the result is not itself a tensor. The reason is that the process of differentiation involves subtracting tensors living in different tangent spaces on a curved manifold, so their difference does not transform in the same way as either of the tensors individually. For example, consider a contravariant vector field X^a. In the tangent space at point P the transformation law for X^a is

X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_P X^b

whereas in the tangent space at point Q the transformation law is

X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_Q X^b

tangent01If we imagine these two tangent spaces at points P and Q on the manifold separated by distance \delta u, the derivative would involve computing

\lim_{\delta u \rightarrow 0} \frac{[X^a]_P - [X^a]_Q}{\delta u}

and the difference in the numerator will clearly not transform like either of the contravariant vectors individually since their transformation matrices are evaluated at different points. The derivative will not therefore be a tensor itself.

The usual way of getting around this is to introduce some kind of auxiliary field on to the manifold which provides a link between the two tensors, thus enabling them to transform in the same way (with respect to the auxiliary field) when they are subtracted. In the present note I want to explore the particular concept of the Lie derivative of a tensor field (named after the mathematician Marius Sophus Lie, 1842-1899) which employs this auxiliary field approach by introducing a contravariant vector field on to the manifold. To this end, suppose we define a vector field X^a (x). This can be used to define streamlines in the manifold (also called a congruence of curves) as the solutions of the ordinary differential equations

\frac{dx^a}{du} = X^a (x(u))

where u is a parameter determining the position on a given streamline. The equations encapsulate the fact that each point on a streamline has a tangent vector belonging to the vector field.

Example Using the notation x^i to denote the i-th coordinate, suppose the vector field is X = (1, x^2(u)). Then X^1 = 1, X^2 = x^2 and the streamlines for this vector field are obtained by solving simultaneously the differential equations

\frac{dx^1}{du} = 1

\frac{dx^2}{du} = x^2

Solving the first equation gives

x^1 = u + c^1

Solving the second equation gives

x^2 = e^{c^2}e^u

Using the solution of the first equation to substitute for u in the solution of the second one we get

x^2 = e^{c^2} e^{x^1 - c^1} = e^{c^2 - c^1}e^{x^1} \equiv Ce^{x^1}

Therefore the streamlines of the vector field (1, x^2(u)) are the graphs of the equation

x^2 = Ce^{x^1}

where C is a constant. Some of the streamlines are shown in the figure below.

streamlines

 

 

 

 

 

 

 

 

 

 

 

Now suppose we want to find the Lie derivative of a tensor field, T^{ab \cdots}_{cd \cdots}(x), using the vector field X^a(x). The essential idea is to use the streamlines of the vector field to link the tensor at some point P, T^{ab\cdots}_{cd \cdots}(P), with the tensor at some neighbouring point Q, T^{ab\cdots}_{cd \cdots}(Q), in such a way that the two will have the same transformation matrix at point Q (with respect to the auxiliary vector field). We can then subtract the two tensors at Q and so define the derivative at P by a limiting process as Q tends to P. In all such cases, the technique begins by considering a coordinate transformation from P to Q of the form

x^{\prime a} = x^a + \delta u X^a(x)

where \delta u is arbitrarily small. The point Q with coordinates x^{\prime a} lies on the streamline through P which the vector field X^a(x) generates. Differentiating the coordinate transformation we get

\frac{\partial x^{\prime a}}{\partial x^b} = \frac{\partial x^a}{\partial x^b} + \delta u \  \partial_b X^a

= \delta^a_b + \delta u \  \partial_b X^a

where \delta^a_b is the Kronecker delta and \partial_b \equiv \frac{\partial}{\partial x^b}. What we now do is consider the effect of the above coordinate transformation on the tensor field T^{ab\cdots}_{cd \cdots} at the points P and Q. In what follows I will employ this general procedure to obtain the Lie derivative formulas with respect to a contravariant vector field X^a(x) in the case of a scalar field \phi, a contravariant vector field Y^a, a one-form field Y_a, and a general mixed tensor field T^{ab\cdots}_{cd \cdots}.

The Lie derivative of a scalar field \phi

Not surprisingly this is the easiest case to deal with since scalars are invariants: the values of a scalar field defined over a manifold do not change under a change in the coordinate system being used. The value of the scalar field at the point P will simply be \phi(x) and the value at the point Q will be

\phi(x^{\prime}) = \phi(x^c + \delta u X^c)

We can expand this in a Taylor series about the point P with coordinates x to get the first-order approximation

\phi(x^{\prime}) \approx \phi(x) + \delta u X^c \ \partial_c \phi(x)

The Lie derivative of the scalar field with respect to the contravariant vector field X^a(x) is then

L_X \phi = \lim_{\delta u \rightarrow 0} \frac{\phi(x^{\prime}) - \phi(x)}{\delta u} = X^c \ \partial_c \phi

\equiv X^a \ \partial_a \phi

We observe that the Lie derivative of the scalar field \phi(x) with respect to the vector field X^a(x) is actually the directional derivative of \phi in the direction of the vector X^a. In the differential geometry literature in this area it is common to associate the contravariant vector field X with the linear differential operator X^a \ \partial_a (which operates on any real-valued function f to produce another function g) and essentially treat them as the same object. Given a point P on the manifold, one thinks of the partial differential operators [\partial_a]_P as constituting a basis for all the vectors in the tangent space at P, so that any vector at P can be written as a linear combination of the [\partial_a]_P in the form

[X]_P = [X^a]_P [\partial_a]_P

This is the intuitive justification for treating the vector field X and the linear differential operator X^a \ \partial_a as being the same things. Under this convention, one often sees the Lie derivative of a scalar field \phi with respect to the contravariant vector field X written as

L_X \phi = X \phi

The Lie derivative of a contravariant vector field Y^a

Under the coordinate transformation from P to Q given by

x^{\prime a} = x^a + \delta u X^a(x)

the contravariant vector field Y^a(x) at P is mapped to

Y^{\prime a}(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^b} Y^b(x)

= (\delta^a_b + \delta u \ \partial_b X^a) Y^b(x)

= Y^a(x) + \delta u \ Y^b \ \partial_b X^a

The vector already at Q, namely Y^a(x^{\prime}), has a first-order Taylor series approximation about x of the form

Y^a(x^{\prime}) = Y^a(x^c + \delta u X^c)

\approx Y^a(x) + \delta u \ X^c \ \partial_c Y^a(x)

The Lie derivative with respect to the vector field X^a(x) is then given by

L_X Y^a = \lim_{\delta u \rightarrow 0} \frac{Y^a (x^{\prime}) - Y^{\prime a}(x^{\prime})}{\delta u}

= X^c \ \partial_c Y^a - Y^b \ \partial_b X^a

\equiv X^b \ \partial_b Y^a - Y^b \ \partial_b X^a

Under the convention of associating the vector field X with the linear differential operator X^a \partial_a, one often sees the Lie derivative of a contravariant vector field Y^a with respect to the field X written as

L_X Y^a = [X, Y]^a

where [X, Y] = XY - YX is called the Lie bracket (or commutator) of the two vector fields X and Y. This is a new vector field (and therefore linear differential operator) that can be written alternatively as

[X, Y] = X(Y^a \partial_a) - Y(X^a \partial_a)

= X^b \partial_b (Y^a \partial_a) - Y^b \partial_b (X^a \partial_a)

= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a + X^a Y^b(\partial_b \partial_a - \partial_a \partial_b)

= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a

where the last equality follows from the fact that the second term in the penultimate line will always vanish by Young’s Theorem (equality of cross-partials). Therefore the a-th component of the vector field [X, Y] is the one that appears in the expression of the Lie derivative L_X Y^a above.

The Lie derivative of a one-form (covariant vector) field Y_a

Under the coordinate transformation from P to Q given by

x^{\prime a} = x^a + \delta u X^a(x)

the one-form (covariant vector) field Y_a(x) at P is mapped to

Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)

To work out the transformation matrix here we need to write the coordinate transformation as

x^{\prime b} = x^b + \delta u X^b(x)

\implies

x^b = x^{\prime b} - \delta u X^b(x)

Partially differentiating we get

\frac{\partial x^b}{\partial x^{\prime a}} = \delta^b_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^b

= \delta^b_a - \delta u \ \partial_c X^b \frac{\partial x^c}{\partial x^{\prime a}}

= \delta^b_a - \delta u \ \partial_c X^b \big(\delta^c_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^a\big)

= \delta^b_a - \delta u \ \partial_a X^b + O((\delta u)^2)

We can ignore the O((\delta u)^2) terms as they will disappear in the limiting process of the differentiation, so we have

Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)

= \big( \delta^b_a - \delta u \ \partial_a X^b \big) Y_b(x)

= Y_a(x) - \delta u Y_b \ \partial_a X^b

Again taking a first-order Taylor series approximation about x at the point Q we get that

Y_a(x^{\prime}) = Y_a(x^c + \delta u X^c)

\approx Y_a(x) + \delta u X^c \ \partial_c Y_a(x)

Then the Lie derivative of the one-form field Y_a(x) with respect to the contravariant vector field X^a(x) is obtained as

L_X Y_a = \lim_{\delta u \rightarrow 0} \frac{Y_a(x^{\prime}) - Y^{\prime}_a (x^{\prime})}{\delta u}

= X^c \partial_c Y_a + Y_b \partial_a X^b

= X^b \partial_b Y_a + Y_b \partial_a X^b

The Lie derivative of a mixed tensor field T^{ab\cdots}_{cd \cdots}

A good prototype for this case is the Lie derivative of the simplest type of mixed tensor field, the rank-2 tensor of type (1, 1) represented as T^a_b(x). We will therefore consider this case first and then use it to extrapolate to a general mixed tensor field of type (p, q) represented as T^{ab \cdots}_{cd \cdots}(x).

Under the coordinate transformation

x^{\prime a} = x^a + \delta u X^a(x)

the mixed tensor field T^a_b(x) transforms as

T^{\prime a}_b(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^c} \frac{\partial x^d}{\partial x^{\prime b}} T^c_d(x)

= (\delta^a_c + \delta u \ \partial_c X^a)(\delta^d_b - \delta u \ \partial_b X^d) T^c_d

= (\delta^a_c + \delta u \ \partial_c X^a)(T^c_b - \delta u T^c_d \ \partial_b X^d)

= T^a_b(x) - \delta u T^a_d \ \partial_b X^d + \delta u T^c_b \ \partial_c X^a + O((\delta u)^2)

Under a first-order Taylor series approximation about x, the tensor at Q can be written

T^a_b(x^{\prime}) = T^a_b(x^c + \delta u X^c)

\approx T^a_b(x) + \delta u X^c \ \partial_c T^a_b

The Lie derivative of T^a_b(x) with respect to the contravariant vector field X^a(x) is then

L_X T^a_b = \lim_{\delta u \rightarrow 0} \frac{T^a_b(x^{\prime}) - T^{\prime a}_b(x^{\prime})}{\delta u}

= X^c \ \partial_c T^a_b + T^a_d \ \partial_b X^d - T^c_b \ \partial_c X^a

We observe that the contravariant index a contributes a term of the form -T^c_b \ \partial_c X^a while the covariant index b contributes a term of the form T^a_d \ \partial_b X^d.

Now consider the general mixed tensor T^{ab \cdots}_{cd \cdots}. The first-order Taylor series approximation of T^{ab \cdots}_{cd \cdots}(x^{\prime}) about x gives

T^{ab \cdots}_{cd \cdots}(x^e + \delta u X^e)

\approx T^{ab \cdots}_{cd \cdots}(x) + \delta u X^e \ \partial_e T^{ab \cdots}_{cd \cdots}(x)

Therefore the first-term of the Lie derivative will be X^e \ \partial_e T^{ab \cdots}_{cd \cdots}. This is of the same form as the first term of L_X T^a_b. Conveniently, it turns out that from then on each contravariant and covariant index in T^{ab \cdots}_{cd \cdots} will contribute terms like the corresponding terms we saw above in L_X T^a_b. Therefore the Lie derivative of the general mixed tensor field T^{ab \cdots}_{cd \cdots}(x) with respect to the contravariant vector field X^a(x) will be of the form

L_X T^{ab \cdots}_{cd \cdots} = X^e \ \partial_e T^{ab \cdots}_{cd \cdots}

- \ T^{eb \cdots}_{cd \cdots} \ \partial_e X^a \ - \ T^{ae \cdots}_{cd \cdots} \ \partial_e X^b \ - \ \cdots

+ \ T^{ab \cdots}_{ed \cdots} \ \partial_c X^e \ + \ T^{ab \cdots}_{ce \cdots} \ \partial_d X^e \ + \ \cdots

 

 

 

A closer look at how the principal, orbital and magnetic quantum numbers arise in Schrödinger’s theory of the hydrogen atom

SchrodingerIn this note I want to explore some aspects of the solution of Schrödinger’s three-dimensional wave equation in spherical polar coordinates which pertain to the three main quantum numbers characterising the electron in a hydrogen atom: the electron’s principal, orbital and magnetic quantum numbers. In particular, I have noticed that many physics and mathematics sources tend to gloss over some (or all) of the mathematical details of how permissibility conditions on the three quantum numbers emerge naturally from validity constraints on the relevant underlying differential equations. I want to bring out these details clearly in the present note.

In general, four quantum numbers are needed to fully describe atomic electrons in many-electron atoms. These four numbers and their permissible values are:

Principal quantum number \tilde{n} = 1, 2, 3, \ldots

Orbital quantum number l = 0, 1, 2, \ldots, (\tilde{n} - 1)

Magnetic quantum number m_l = 0, \pm 1, \pm 2, \ldots, \pm l

Spin magnetic quantum number m_s = -\frac{1}{2}, +\frac{1}{2}

The principal quantum number determines the electron’s energy, the orbital quantum number its orbital angular-momentum magnitude, the magnetic quantum number its orbital angular-momentum direction, and the spin magnetic quantum number its spin direction.

I have noticed that it is often not explained clearly why, for example, the orbital quantum number cannot exceed the principal quantum number minus one, or why the magnitude of the magnetic quantum number cannot exceed that of the orbital quantum number. I will explore issues like these in depth in the context of the time-independent Schrödinger equation for the hydrogen atom, which only involves the first three quantum numbers. I will not discuss the spin magnetic quantum number in the present note.

Schrödinger’s wave equation for the hydrogen atom

In Cartesian coordinates, Schrödinger’s three-dimensional equation for the electron in the hydrogen atom is

\frac{\partial^2 \psi}{\partial x^2} + \frac{\partial^2 \psi}{\partial y^2} + \frac{\partial^2 \psi}{\partial z^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0

where m_e denotes the electron mass. The potential energy U is the electric potential energy of a charge -e given that it is at distance r from another charge +e, namely

U = -\frac{e^2}{4 \pi \epsilon_0 r}

It is necessary to change variables in Schrödinger’s equation since the potential energy is a function of radial distance r rather than the Cartesian coordinate variables x, y and z. Given the spherical symmetry of the atom, it is sensible to proceed by changing the variables in Schrödinger’s equation to those of spherical polar coordinates (rather than changing the r variable in U to Cartesian coordinates using r = \sqrt{x^2 + y^2 + z^2}). Only the variables in the Laplacian part of Schrödinger’s equation need to be changed, so we can use the approach in my previous note on changing variables in Laplace’s equation to get

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0

I will now temporarily simplify things by using the representation of the square of the angular momentum operator in spherical polar coordinates which I obtained in another previous note, namely

L^2 = - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)

= - \hbar^2 r^2 \big( \frac{1}{r^2 \sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)

Using this to replace the two middle terms in Schrödinger’s equation and rearranging we get

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{2m_e}{\hbar^2}(E - U) \psi = \frac{L^2}{\hbar^2 r^2} \psi

This equation can now be solved by the usual separation of variables approach. We assume that the \psi function can be expressed as a product

\psi(r, \theta, \phi) = R(r) Y(\theta, \phi)

and then substitute this back into the wave equation to get

\frac{Y}{r^2} \frac{d}{ d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e}{\hbar^2}(E - U) R Y = \frac{R}{\hbar^2 r^2} L^2 Y

Dividing through by \frac{R Y}{r^2} we get

\frac{1}{R} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e r^2}{\hbar^2}(E - U) = \frac{1}{\hbar^2 Y} L^2 Y

Since the left-hand side of this equation depends only on r while the right-hand side depends only on \theta and \phi, both sides must be equal to some constant which we can call \lambda. Setting the left and right-hand sides equal to \lambda in turn and rearranging slightly we finally get the radial equation

\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0

and the angular equation

L^2 Y = \lambda \hbar^2 Y

We can now apply separation of variables again to the angular equation. Rewriting the operator L^2 in full the angular equation becomes

- \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} \big) = \lambda \hbar^2 Y

which simplifies to

\frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} + \lambda Y = 0

We assume that the Y function can be written as the product

Y(\theta, \phi) = \Theta (\theta) \Phi (\phi)

Substituting this into the angular equation gives

\frac{\Phi}{\sin \theta}\frac{d}{d \theta} \big( \sin \theta \frac{d \Theta}{d \theta}\big) + \frac{\Theta}{\sin^2 \theta}\frac{d^2 \Phi}{d \phi^2} + \lambda Y \Theta \Phi = 0

Multiplying through by \frac{\sin^2 \theta}{\Theta \Phi} and rearranging we get

-\frac{1}{\Phi} \frac{d^2 \Phi}{d \phi^2} = \frac{\sin^2 \theta}{\Theta}\big[ \frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \lambda \Theta\big]

Since the left-hand side of this equation depends only on \phi while the right-hand side depends only on \theta, both sides must be equal to some constant which we can provisionally call k. Setting the left and right-hand sides equal to k in turn and rearranging we get

\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0

and

\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0

We now have three ordinary differential equations that need to be solved, one for \Phi, one for \Theta and one for R. We will solve each of them in turn.

The equation for \Phi

The equation for \Phi is a straightforward second-order differential equation with auxiliary equation

\zeta^2 + k = 0

implying \zeta = \pm \sqrt{-k} if k < 0 and \zeta  = \pm i \sqrt{k} if k > 0. Therefore it has a general solution of the form

\Phi(\phi) = Ae^{\sqrt{-k} \phi} + Be^{- \sqrt{-k} \phi}

if k < 0 and \Phi(\phi) = Ae^{i \sqrt{k} \phi} + Be^{- i \sqrt{k} \phi} if k > 0, where A and B are arbitrary constants. Now, the azimuth angle \phi can take any value in (-\infty, \infty) but the function \Phi must take a single value at each point in space (since this is a required property of the quantum wave function which \Phi is a constituent of). It follows that the function \Phi must be periodic since it must take the same value at \phi and \phi + 2\pi for any given \phi. This imposes two constraints on the form of the general solution: (1) it cannot consist only of exponential functions with real arguments since these are not periodic (thus ruling out the first general solution above and thereby implying that the separation constant k must be nonnegative); (2) \sqrt{k} must be an integer. Given these constraints, it is customary in quantum mechanics to denote \pm \sqrt{k} by the letter m (it is called the magnetic quantum number) and to specify the separation constant in the angular equations as m^2, which guarantees its nonnegativity. We then state the general solution of the equation for \Phi as

\Phi(\phi) = Ae^{i m \phi} + Be^{- i m \phi}

In practice, a particular electron wave function will involve only a single value of m so only the first of the two terms in the general solution will be necessary. We can therefore state the general solution of the equation for \Phi for a given magnetic quantum number m as

\Phi(\phi) \propto e^{i m \phi}

The equation for \Theta

Given that we now know the separation constant for the angular equations is either zero or a positive square number k = m^2, we can write the equation for \Theta as

\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0

Expanding the first term we get

\frac{1}{\sin \theta} \cos \theta \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2} + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0

I am now going to multiply and divide the first two terms by \sin^2 \theta to get

\sin^2 \theta \big(\frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} \big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0

\iff

\sin^2 \theta \big(- \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} + \frac{2 \cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0

Now we can make the change of variable x = \cos \theta which implies dx = - \sin \theta d \theta and therefore

\frac{d \theta}{dx} = - \frac{1}{\sin \theta}

\frac{d \Theta}{d x} = \frac{d \Theta}{d \theta} \frac{d \theta}{d x} = - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta}

\frac{d^2 \Theta}{d x^2} = \frac{d}{d \theta} \big[ - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta} \big] \frac{d \theta}{d x} = - \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2}

Using these in the amended form of the \Theta equation together with the fact that \sin^2 \theta = 1 - x^2, the \Theta equation becomes

(1 - x^2) \big(\frac{d^2 \Theta}{d x^2} - \frac{2x}{1 - x^2} \frac{d \Theta}{d x} \big) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

\iff

(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

We will solve this equation first for the case m = 0 (the solutions will be Legendre polynomials) and use these results to construct solutions for the case m \neq 0 (the solutions here will be the associated Legendre functions). Setting m = 0 we get

(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \lambda \Theta = 0

which has the form of a well known differential equation known as Legendre’s equation. It can be solved by assuming a series solution of the form

\Theta = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + \cdots + a_n x^n + \cdots

and then differentiating it term by term twice to get

\Theta^{\prime} = a_1  + 2 a_2 x + 3 a_3 x^2 + 4 a_4 x^3 + \cdots + n a_n x^{n-1} + \cdots

and

\Theta^{\prime \prime} = 2 a_2 + 6 a_3 x + 12 a_4 x^2 + 20 a_5 x^3 + \cdots + n (n - 1) a_n x^{n-2} + \cdots

We now substitute these into Legendre’s equation and set the coefficient of each power of x equal to zero (because \Theta must satisfy Legendre’s equation identically). We find that the coefficient of the x^n term satisfies

(n + 2)(n + 1) a_{n + 2} + (\lambda - n(n + 1)) a_n = 0

which implies

a_{n + 2} = - \frac{(\lambda - n(n + 1))}{(n + 2)(n + 1)} a_n

This formula makes it possible to find any even coefficient as a multiple of a_0 and any odd coefficient as a multiple of a_1. The general solution of our Legendre equation is then a sum of two series involving two arbitrary constants a_0 and a_1:

\Theta = a_0 \bigg \{1 - \frac{\lambda}{2!} x^2 + \frac{\lambda (\lambda - 6)}{4!} x^4 - \frac{\lambda(\lambda - 6)(\lambda - 20)}{6!} x^6 + \cdots \bigg \}

+ a_1 \bigg \{x - \frac{(\lambda - 2)}{3!} x^3 + \frac{(\lambda - 2)(\lambda - 12)}{5!} x^5 - \frac{(\lambda - 2)(\lambda - 12)(\lambda - 30)}{7!} x^7 + \cdots \bigg \}

Both of the series in this sum converge for x^2 < 1 but in general they do not converge for x^2 = 1. This is a problem for us because in our change of variables we set x = \cos \theta and we want solutions that converge for all possible values of \theta including those that result in x^2 = 1. It turns out that the only way to get such solutions is to choose integer values of \lambda that make either the a_0 or the a_1 series in the above sum terminate (the other series will generally be divergent so we remove it by setting the corresponding arbitrary constant equal to zero). This requires \lambda to take values in the quadratic sequence 0, 2, 6, 12, 20, 30, 42, 56 \ldots The l-th term of this sequence is l(l + 1), so the separation constant \lambda must be of this form, i.e., \lambda = l(l + 1) for some l = 0, 1, 2, 3, \dots. When l takes an even value the a_0 series will terminate and we can set a_1 = 0 to make the other series vanish. Conversely, when l takes an odd value the a_1 series will terminate and we can set a_0 = 0 to make the other series vanish.

From the eigenvalue equation for L^2 given earlier (L^2 Y = \lambda \hbar^2 Y) it is clear that the magnitude of the orbital angular momentum is L = \sqrt{l(l + 1)} \hbar. It is interesting to see how the form of this arises mathematically from considering series solutions to Legendre’s equation above. The parameter l is called the orbital angular momentum quantum number.

Note that negative integral values of l are allowed but they simply give solutions already obtained for positive values. For example, l = -2 gives \lambda = 2 and this makes the a_1 series terminate, yielding the polynomial solution

\Theta = a_1 x

This is exactly the same solution as the one that would be obtained if l = 1. It is therefore customary to restrict l to nonnegative values. Each possible value of l gives a polynomial solution to Legendre’s equation. For l = 0 we get \Theta = a_0, for l = 1 we get \Theta = a_1 x, for l = 2 we get \Theta = a_0 - 3 a_0 x^2, and so on. If the value of a_0 or a_1 in each polynomial equation is selected so that \Theta = 1 when x = 1 the resulting polynomials are called Legendre polynomials, denoted by P_l(x). Given that for each l we have P_l(1) = 1 the first few Legendre polynomials are

P_0(x) = 1 P_1(x) = x

P_2(x) = \frac{1}{2}(3 x^2 - 1)

P_3(x) = \frac{1}{2}\big(5 x^3 - 3x \big)

These are the physically acceptable solutions to Legendre’s equation for \Theta above.

We now consider the solutions for

m \neq 0 of the equation (1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

We now know that \lambda = l(l + 1) so we can write this in and we can also add the subscript l to m as the solutions to this equation will involve a link the between the values of the orbital angular momentum and magnetic quantum numbers. The equation we need to solve becomes

(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big[l(l + 1) - \frac{m_l^2}{1 - x^2} \big] \Theta = 0

The link between l and m_l arises from the fact that we are constrained in trying to solve this equation: it encompasses the case m_l = 0 for which the physically acceptable solutions are the Legendre polynomials P_l(x). Therefore the physically allowable solutions for the above equation must include the Legendre polynomials as a special case. We can find these by using the series approach again and it turns out that the physically acceptable solutions are the so-called associated Legendre functions which take the form

P_l^{m_l}(x) = (1 - x^2)^{m_l/2}\frac{d^{m_l}}{d x^{m_l}}P_l(x)

Now, each Legendre polynomial P_l(x) is a polynomial of degree l. Therefore the m_l-th order derivative in P_l^{m_l} will equal zero if |m_l| > l, so for physically acceptable solutions we must impose the constraint |m_l| \leq l in the differential equation for \Theta. This is where the link between the quantum numbers l and m_l comes from in the quantum theory of the hydrogen atom: given a value of l the acceptable values of m_l are integers in the range -l \leq m_l \leq l.

Finally, note two things: (1) P_l^{m_l}(x) reduces to the Legendre polynomial P_l(x) when m_l = 0, which is what we needed. (2) A negative value for m_l does not change m_l^2 in the original differential equation so a solution for positive m_l is also a solution for the corresponding negative m_l. Thus many references define the associated Legendre function P_l^{m_l}(x) for -l \leq m_l \leq l as P_l^{|m_l|}(x).

To conclude, given values for the quantum numbers l and m_l, the general solution of the equation for \Theta can be written as

\Theta(\theta) \propto P_l^{m_l}(\cos \theta)

The radial equation for R

To clarify where the principal quantum number comes from, the final equation we need to deal with is the radial equation

\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0

Writing \lambda = l(l + 1) and replacing U with the formula for the potential energy we get

\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}\big(\frac{e^2}{4 \pi \epsilon_0 r} + E\big) - \frac{l(l + 1)}{r^2} \big] R = 0

\iff

\frac{d^2 R}{d r^2} + \frac{2}{r} \frac{d R}{d r} + \frac{2m_e}{\hbar^2} \big[ E + \frac{e^2}{4 \pi \epsilon_0 r} - \frac{l(l + 1) \hbar^2}{2 m_e r^2} \big] R = 0

We are only interested in solutions for which the electron is bound within the atom, so we take E < 0 (the negative energy of the electron is the amount of energy that must be supplied to it to free it from the atom). In order to solve the above equation it is customary to make the change of variable

\rho = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} r

and define the dimensionless constant

\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2}

If we then specify R = R(\rho) we have

\frac{d R}{d r} = \frac{d R}{d \rho} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d R}{d \rho}

\frac{2}{r} \frac{d R}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{2}{\rho} \frac{d R}{d \rho}

\frac{d^2 R}{d r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d^2 R}{d \rho^2} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{d^2 R}{d \rho^2}

\frac{2 m_e}{\hbar^2} E + \frac{2 m_e}{\hbar^2} \frac{e^2}{4 \pi \epsilon_0 r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big \{\frac{1}{4}\frac{e^2}{4 \pi \epsilon_0 r} \big(-\frac{1}{E}\big) - \frac{1}{4}\big \} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big(\frac{\tau}{\rho} - \frac{1}{4}\big)

\frac{l(l + 1)}{r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{l(l + 1)}{\rho^2}

Using these results we can rewrite the differential equation as

\big(-\frac{8 m_e E}{\hbar^2}\big) \bigg \{\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) \bigg \} = 0

\iff

\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0

To make further progress we consider the behaviour of this differential equation as \rho \rightarrow \infty. It reduces to

\frac{d^2 R}{d \rho^2} - \frac{1}{4} R = 0

which is a straightforward second-order differential equation with auxiliary equation

\zeta^2 - \frac{1}{4} = 0

\implies \zeta = \pm \frac{1}{2}

The positive solution to the auxiliary equation implies a term in the general solution of the form e^{\rho/2} which is unacceptable since it explodes as \rho \rightarrow \infty. Therefore we only accept the negative solution to the auxiliary equation and the general solution for R as \rho \rightarrow \infty must be of the form

R \propto e^{-\rho/2}

This suggests we can try an exact solution of the full differential equation of the form

R = e^{-\rho/2} F(\rho)

Differentiating this twice we get

\frac{d R}{d \rho} = -\frac{1}{2} e^{-\rho/2} F(\rho) + e^{-\rho/2} F^{\prime} (\rho)

\frac{d^2 R}{d \rho^2} = \frac{1}{4} e^{-\rho/2} F(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) + e^{-\rho/2} F^{\prime \prime} (\rho)

Substituting these into the differential equation

\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0

gives

F^{\prime \prime}(\rho) + \frac{(2 - \rho)}{\rho} F^{\prime}(\rho) + \big[\frac{(\tau - 1)}{\rho} - \frac{l(l + 1)}{\rho^2}\big] F(\rho) = 0

\iff

\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0

We can now try to solve this latest version of the differential equation by the method of Frobenius, which involves assuming a generalised power series solution of the form

F(\rho) = a_0 \rho^s + a_1 \rho^{s + 1} + a_2 \rho^{s + 2} + \cdots

Differentiating twice we get

F^{\prime}(\rho) = s a_0 \rho^{s-1} + (s + 1) a_1 \rho^s + (s + 2) a_2 \rho^{s + 1} + \cdots

F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^{s-2} + s (s + 1) a_1 \rho^{s-1} + (s + 1) (s + 2) a_2 \rho^s + \cdots

Then the terms appearing in the differential equation have the generalised power series forms

\rho^2 F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^s + s (s + 1) a_1 \rho^{s+1} + (s + 1) (s + 2) a_2 \rho^{s+2} + \cdots

2 \rho F^{\prime}(\rho) = 2 s a_0 \rho^s + 2 (s + 1) a_1 \rho^{s+1} + 2 (s + 2) a_2 \rho^{s + 2} + \cdots

-\rho^2 F^{\prime}(\rho) = - s a_0 \rho^{s+1} - (s + 1) a_1 \rho^{s+2} - (s + 2) a_2 \rho^{s + 3} - \cdots

(\tau - 1) \rho F(\rho) = (\tau - 1) a_0 \rho^{s+1} + (\tau - 1) a_1 \rho^{s + 2} + (\tau - 1) a_2 \rho^{s + 3} + \cdots

-l(l + 1) F(\rho) = -l(l + 1) a_0 \rho^s - l(l + 1) a_1 \rho^{s + 1} - l(l + 1) a_2 \rho^{s + 2} - \cdots

Summing these terms (remembering that the sum must be identically equal to zero) we find the coefficient of \rho^s to be

[s(s - 1) + 2s - l(l+1)]a_0 = 0

\implies

s(s + 1) - l(l+1) = 0

\implies

s = l or s = -l - 1

Now, when s = -l - 1 the first term of the power series for F(\rho) is a_0/\rho^{l+1} which explodes as \rho \rightarrow 0. This is unacceptable so we discard this solution and set s = l.

For the coefficient of \rho^{s + n} we get

[(s+n)(s+n-1) + 2(s+n) - l(l+1)]a_n + [(\tau - 1) - (s+n-1)]a_{n-1} = 0

Setting s = l and rearranging gives us the recurrence equation

a_n = \frac{(l + n - \tau)}{(l+n+1)(l+n) - l(l+1)} a_{n-1}

From this recurrence equation we observe that

a_n \rightarrow \frac{1}{n}a_{n-1} = \frac{1}{n!}a_0

as n \rightarrow \infty. We deduce from this that the series for F(\rho) becomes like a_0 \rho^l \sum \frac{\rho^n}{n!} as n \rightarrow \infty and therefore R = e^{-\rho/2} F(\rho) becomes like a_0 \rho^l e^{\rho/2}. However, this diverges as \rho \rightarrow \infty which is unacceptable, so we conclude that the series for F(p) must terminate at some value of n which we will call N. In this case we have a_{N+1} = 0 which the recurrence equation tells us can only happen if

\tau = l + N + 1 \equiv \tilde{n}

This is how the principal quantum number \tilde{n} first appears. Now, we have

\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2} = \tilde{n}

\iff

\big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \big(-\frac{m_e}{2 \hbar^2}\big) \frac{1}{E} = \tilde{n}^2

\iff E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}

These are the famous bound-state energy eigenvalues for \tilde{n} = 1, 2, \ldots. This is the same formula for the energy levels of the hydrogen atom that Niels Bohr obtained by intuitive means in his 1913 ‘solar system’ model of atomic structure.

As stated above, the integer \tilde{n} is called the principal quantum number. Recall that \tilde{n} = l + N + 1 and N cannot be smaller than zero. It follows that

\tilde{n} - l - 1 \geq 0

\iff

l \leq \tilde{n} - 1

This explains why for a given value of \tilde{n} the allowable values of l are l = 0, 1, 2, \dots, (\tilde{n}-1).

Returning to the solution of

\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0

the above discussion suggests that we should look for a solution of the form

F(\rho) = a_0 \rho^l L(\rho)

where L(\rho) is a polynomial (rather than an infinite series). Differentiating this twice gives

F^{\prime}(\rho) = a_0 l \rho^{l-1}L(\rho) + a_0 \rho^l L^{\prime}(\rho)

F^{\prime \prime} (\rho) = a_0 (l-1) l \rho^{l-2}L(\rho) + 2 a_0 l \rho^{l-1} L^{\prime}(\rho) + a_0 \rho^l L^{\prime \prime}(\rho)

Substituting these into the differential equation and setting \tau = \tilde{n} we get

\rho^{l+2} L^{\prime \prime}(\rho) + (2l + 2 - \rho) \rho^{l+1}L^{\prime}(\rho) + (\tilde{n} - 1 - l)\rho^{l+1}L(\rho) = 0

\iff

\rho L^{\prime \prime}(\rho) + (2l + 2 - \rho) L^{\prime}(\rho) + (\tilde{n} - 1 - l) L(\rho) = 0

\iff

\rho L^{\prime \prime}(\rho) + (\alpha + 1 - \rho) L^{\prime}(\rho) + \tilde{n}^{*} L(\rho) = 0

where \alpha \equiv 2l + 1 and \tilde{n}^{*} \equiv \tilde{n} - 1 - l. This last form is a well known differential equation whose physically acceptable solutions in the present context are associated Laguerre polynomials given by the formula

L_{\tilde{n}^{*}}^{(\alpha)} = \sum_{j = 0}^{\tilde{n}^{*}} (-1)^j \frac{(\tilde{n}^{*} + \alpha)!}{(\tilde{n}^{*}-j)!(\alpha + j)!}\frac{\rho^j}{j!}

For given quantum numbers \tilde{n} and l, the solution of the radial equation for R is then

R_{\tilde{n}l}(r) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)}

Final form of the electron wave function \psi

Putting everything together, for given principal quantum number \tilde{n}, orbital quantum number l and magnetic quantum number m_l, the wave function of the electron in the hydrogen atom is

\psi_{\tilde{n} l m_l}(r, \theta, \phi) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)} P_l^{m_l}(\cos \theta) e^{i m_l \phi}

where

\rho = \big(-\frac{8 m_e E_{\tilde{n}}}{\hbar^2}\big)^{1/2} r

and

E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}

Changing variables in the square of the quantum mechanical angular momentum operator

orbitalA particle of mass m with position vector \mathbf{r} and velocity \mathbf{v} (with respect to some specified origin) has a linear momentum vector \mathbf{p} = m \mathbf{v} and angular momentum vector

\mathbf{L} = \mathbf{r} \times \mathbf{p} = m \mathbf{r} \times \mathbf{v}

where \times is the vector product operation. The magnitude of the angular momentum vector is L = rmv\sin \theta where \theta is the angle between \mathbf{r} and \mathbf{v}. The direction of \mathbf{L} is given by the right-hand rule when the vectors \mathbf{r} and \mathbf{v} are placed with their tails at the same point: righthandruleone curls the fingers of the right hand in the direction of rotation of \mathbf{r} into \mathbf{v} and the thumb then points in the direction of \mathbf{L}. One can find the components of \mathbf{L} in the x, y and z directions in Cartesian coordinates using

\mathbf{L} = \mathbf{r} \times \mathbf{p} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & z \\ \ \\p_x & p_y & p_z \end{vmatrix}

= (yp_z - zp_y)\mathbf{i} + (zp_x - xp_z)\mathbf{j} + (xp_y - yp_x)\mathbf{k}

Therefore the components of \mathbf{L} in Cartesian coordinates are

L_x = yp_z - zp_y

L_y = zp_x - xp_z

L_z = xp_y - yp_x

In classical mechanics the angular momentum magnitude L can take a continuum of values but in quantum mechanics only certain discrete values for L are permissible. Furthermore, the linear momentum vector \mathbf{p} appearing in \mathbf{L} must obey Heisenberg’s uncertainty principle in each direction, i.e.,

\Delta x \Delta p_x \geq \frac{\hbar}{2}

in the x-direction and similarly for the y and z directions. These features of quantized variables like \mathbf{p} and \mathbf{L} make it necessary to do calculations with them in quantum mechanics using their operator representations. (For example, on the quantum scale one cannot calculate the expectation of momentum in the x-direction using an integral involving momentum as a function of x because no such function can exist: Heisenberg’s uncertainty principle prevents accurate knowledge of momentum when the value of x is known exactly. One must therefore use the operator representation of momentum rather than momentum as some function of x in order to calculate the expectation). It is a basic postulate of quantum mechanics that every observable quantity characterising a physical system can be represented by a quantum mechanical operator obtained by expressing the quantity in terms of \mathbf{r} and \mathbf{p} and then replacing the vector \mathbf{p} by - i \hbar \nabla where

\nabla = \mathbf{i} \frac{\partial}{\partial x} + \mathbf{j} \frac{\partial}{\partial y} + \mathbf{k} \frac{\partial}{\partial z}

and its components p_xp_y and p_z  by

p_x = - i \hbar \frac{\partial}{\partial x}

p_y = - i \hbar \frac{\partial}{\partial y}

p_z = - i \hbar \frac{\partial}{\partial z}

Taking this on board we can then write \mathbf{L} and L_x, L_y and L_z in quantum mechanical operator form in Cartesian coordinates as

\mathbf{L} = - i \hbar (\mathbf{r} \times \nabla)

L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)

L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)

L_x = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)

In order to perform some calculations involving Schrödinger’s equation I needed to employ the square of the quantum mechanical angular momentum operator L^2, but in spherical polar coordinates rather than Cartesian coordinates, where

L^2 = L_x^2 + L_y^2 + L_z^2

I used the matrix calculus approach in my previous note to achieve the necessary change of variables in L^2. In the present note I want to record the details of this calculation as I have never seen this approach used elsewhere. (This change of variables can also be done using a scale factor method based on vector calculus which I will not go into here).

As in my previous note we begin the matrix calculus approach with the standard conversion equations for spherical polar coordinates:

x = r \sin \theta \cos \phi

y = r \sin \theta \sin \phi

z = r \cos \theta

Differentiating with respect to the vector (r, \theta, \phi) we get

\begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} =  \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix}

We can then use the matrix version of the chain rule to write

\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}

We can solve this system by inverting the coefficient matrix to get

\begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \ \\ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \ \\ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

Using the equations in this system together with the standard conversion equations we then have

y \frac{\partial F}{\partial z} = r \sin \theta \sin \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \sin \phi \frac{\partial F}{\partial r} - \sin^2 \theta \sin \phi \frac{\partial F}{\partial \theta}

and

z \frac{\partial F}{\partial y} = r \cos \theta \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial r} + \cos^2 \theta \sin \phi \frac{\partial F}{\partial \theta} + \frac{\cos \theta \cos \phi}{\sin \theta}\frac{\partial F}{\partial \phi}

Subtracting the second expression from the first and ignoring the F in the numerators of the partial derivatives we can then write the angular momentum operator in the x-direction in terms of spherical polar coordinates as

L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)

= - i \hbar \big( - \cot \theta \cos \phi \frac{\partial}{\partial \phi} - \sin \phi \frac{\partial}{\partial \theta} \big)

= i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big)

Similarly we have

z \frac{\partial F}{\partial x} = r \cos \theta \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial r} + \cos^2 \theta \cos \phi \frac{\partial F}{\partial \theta} - \frac{\cos \theta \sin \phi}{\sin \theta}\frac{\partial F}{\partial \phi}

and

x \frac{\partial F}{\partial z} = r \sin \theta \cos \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \cos \phi \frac{\partial F}{\partial r} - \sin^2 \theta \cos \phi \frac{\partial F}{\partial \theta}

Therefore

L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)

= - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)

Finally, we have

x \frac{\partial F}{\partial y} = r \sin \theta \cos \phi \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} + \cos^2 \phi \frac{\partial F}{\partial \phi}

and

y \frac{\partial F}{\partial x} = r \sin \theta \sin \phi \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)

= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} - \sin^2 \phi \frac{\partial F}{\partial \phi}

Therefore

L_z = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)

= - i \hbar \frac{\partial}{\partial \phi}

Having obtained the components of \mathbf{L} in spherical polar coordinates we can now finally calculate the operator representation of L^2 as

L^2 = L_x^2 + L_y^2 + L_z^2

= \big[ i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big) \big]^2 + \big[ - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) \big]^2 + \big[ - i \hbar \frac{\partial}{\partial \phi} \big]^2

= - \hbar^2 \big[ \sin^2 \phi \frac{\partial^2}{\partial \theta^2} + \big(\sin \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big) + \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big)\big(\sin \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \cos^2 \phi \frac{\partial^2}{\partial \phi^2} \big]

- \hbar^2 \big[ \cos^2 \phi \frac{\partial^2}{\partial \theta^2} - \big(\cos \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) - \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)\big(\cos \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \sin^2 \phi \frac{\partial^2}{\partial \phi^2} \big]

- \hbar^2 \frac{\partial^2}{\partial \phi^2}

= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2} - \sin \phi \frac{1}{\sin^2 \theta} \cos \phi \frac{\partial}{\partial \phi} + \cot \theta \cos^2 \phi \frac{\partial}{\partial \theta}

+ \cos \phi \frac{1}{\sin^2 \theta} \sin \phi \frac{\partial}{\partial \phi} + \cot \theta \sin^2 \phi \frac{\partial}{\partial \theta}\big)

= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot \theta \frac{\partial}{\partial \theta} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2}\big)

= - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)

A matrix calculus approach to changing variables in Laplace’s equation

laplaceIn three-dimensional rectangular coordinates, the partial differential equation known as Laplace’s equation takes the form

\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} = 0

This equation is applicable to a wide range of problems in physics but it is often necessary to make a change of variables from rectangular to spherical polar coordinates in order to better match the spherical symmetry of particular contexts. Until now I had only ever worked out this change of variables using a scale factor method based on the formula for the Laplacian of a function F in general orthogonal curvilinear coordinates:

\nabla^2 F = \frac{1}{h_1 h_2 h_3}\bigg[ \frac{\partial }{\partial x_1}\big( \frac{h_2 h_3}{h_1}\frac{\partial F}{\partial x_1} \big) + \frac{\partial }{\partial x_2}\big( \frac{h_1 h_3}{h_2}\frac{\partial F}{\partial x_2} \big) + \frac{\partial }{\partial x_3}\big( \frac{h_1 h_2}{h_3}\frac{\partial F}{\partial x_3} \big) \bigg]

This formula is derived using basic results from vector calculus and allows Laplace’s equation to be easily expressed in any orthogonal coordinate system once the form of the Euclidean metric is known in that coordinate system. (Orthogonal coordinate systems are those for which the coordinate basis vectors are always orthogonal at each point). The Euclidean metric in general orthogonal coordinates x_1, x_2, x_3 will take the form

ds^2 = h_1^2 dx_1^2 + h_2^2 dx_2^2 + h_3^2 dx_3^2

The coefficients h_1, h_2 and h_3 are the scale factors appearing in the Laplacian formula. In the metric, these convert the coordinate differentials dx_i into lengths h_i dx_i.

In the case of rectangular coordinates we have x_1 = x, x_2 = y, x_3 = z and h_1 = h_2 = h_3 = 1 so the Euclidean metric and the Laplacian formula reduce to the familiar forms

ds^2 = dx^2 + dy^2 + dz^2

and

\nabla^2 F = \frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2}

The scale factors are all equal to 1 in this case since the coordinate differentials are already in units of length. In the case of spherical polar coordinates (an orthogonal coordinate system) we can find the scale factors and hence the Laplacian by using the standard conversion equations

x = r \sin \theta \cos \phi

y = r \sin \theta \sin \phi

z = r \cos \theta

Taking the differentials of these, squaring them and adding them we find that the Euclidean metric in spherical polar coordinates takes the form

ds^2 = dr^2 + r^2 d \theta^2 + r^2 \sin^2 \theta d \phi^2

The scale factors in this case are therefore h_1 = 1, h_2 = r and h_3 = r \sin \theta. Putting these into the Laplacian formula we immediately get

\nabla^2 F = \frac{1}{r^2 \sin \theta} \bigg[ \frac{\partial }{\partial r}\big( \frac{r^2 \sin \theta}{1}\frac{\partial F}{\partial r} \big) + \frac{\partial }{\partial \theta}\big( \frac{r \sin \theta}{r}\frac{\partial F}{\partial \theta} \big) + \frac{\partial }{\partial \phi}\big( \frac{r}{r \sin \theta}\frac{\partial F}{\partial \phi} \big) \bigg]

= \frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2}

Therefore Laplace’s equation in spherical polar coordinates is

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2} = 0

I recently had occasion to use Laplace’s equation in spherical polar coordinates again and this time I decided to have a go at justifying the change of variables to myself using a matrix calculus approach instead of the above scalar factor method. This turned out to be a laborious but interesting process. In the present note I want to record my calculations using the matrix calculus approach in detail as I have never seen something like this elsewhere. The advantage of the matrix calculus approach is that it works for any coordinate system irrespective of whether it is orthogonal or not.

In the matrix calculus approach we use the standard conversion equations for spherical polar coordinates

x = r \sin \theta \cos \phi

y = r \sin \theta \sin \phi

z = r \cos \theta

to work out the coefficient matrix in the system

\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}

\big(which of course is equivalent to the set of equations

\frac{\partial F}{\partial r} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial r} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial r}

\frac{\partial F}{\partial \theta} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial \theta} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial \theta} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial \theta}

\frac{\partial F}{\partial \phi} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial \phi} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial \phi} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial \phi} \big)

We get the system

\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}

We can solve this system either by inverting the coefficient matrix or by using Cramer’s rule. Using Cramer’s rule we get

\frac{\partial F}{\partial x} = \frac{\begin{vmatrix} \frac{\partial F}{\partial r} & \sin \theta \sin \phi & \cos \theta\\ \ \\ \frac{\partial F}{\partial \theta} & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ \frac{\partial F}{\partial \phi} & r \sin \theta \cos \phi & 0\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \sin^2 \theta \cos \phi \frac{\partial F}{\partial r} + r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial \theta} - r \sin \phi \frac{\partial F}{\partial \phi} }{r^2 \sin \theta}
\
= \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}

and similarly

\frac{\partial F}{\partial y} = \frac{\begin{vmatrix} \sin \theta \cos \phi & \frac{\partial F}{\partial r} & \cos \theta\\ \ \\ r \cos \theta \cos \phi & \frac{\partial F}{\partial \theta} & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & \frac{\partial F}{\partial \phi} & 0\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \sin^2 \theta \sin \phi \frac{\partial F}{\partial r} + r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial \theta} + r \cos \phi \frac{\partial F}{\partial \phi} }{r^2 \sin \theta}
\
= \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}

and

\frac{\partial F}{\partial z} = \frac{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \frac{\partial F}{\partial r}\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & \frac{\partial F}{\partial \theta}\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & \frac{\partial F}{\partial \phi}\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \cos \theta \sin \theta \frac{\partial F}{\partial r} - r \sin^2 \theta \frac{\partial F}{\partial \theta}}{r^2 \sin \theta}
\
= \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta}

We can write these solutions more concisely in matrix form as

\begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \ \\ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \ \\ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

The coefficient matrix here is, of course, the inverse of the coefficient matrix in the original system.

To find the second-order partials \frac{\partial^2F}{\partial x^2}, \frac{\partial^2F}{\partial y^2} and \frac{\partial^2F}{\partial z^2}, let G \equiv \frac{\partial F}{\partial x}, H \equiv \frac{\partial F}{\partial y} and I \equiv \frac{\partial F}{\partial z}. Then we need to find \frac{\partial G}{x}, \frac{\partial H}{\partial y} and \frac{\partial I}{\partial z}. We can use the first, second and third equations respectively in the above matrix system to find these, with F replaced by G, H and I respectively. Thus,

\frac{\partial G}{\partial x} =  \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial G}{\partial r}\\ \ \\ \frac{\partial G}{\partial \theta} \\ \ \\ \frac{\partial G}{\partial \phi} \end{bmatrix}

\frac{\partial H}{\partial y} =  \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial H}{\partial r}\\ \ \\ \frac{\partial H}{\partial \theta} \\ \ \\ \frac{\partial H}{\partial \phi} \end{bmatrix}

\frac{\partial I}{\partial z} =  \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix} \begin{bmatrix} \frac{\partial I}{\partial r}\\ \ \\ \frac{\partial I}{\partial \theta} \end{bmatrix}

But differentiating

G \equiv \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

with respect to r, \theta and \phi we get

\begin{bmatrix} \frac{\partial G}{\partial r}\\ \ \\ \frac{\partial G}{\partial \theta} \\ \ \\ \frac{\partial G}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \cos \phi\\ \ \\ \frac{\cos \theta \cos \phi}{r} \\ \ \\ -\frac{\sin \phi}{r \sin \theta} \end{bmatrix}  + \begin{bmatrix} 0 & -\frac{\cos \theta \cos \phi}{r^2} & \frac{\sin \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \cos \phi & -\frac{\sin \theta \cos \phi}{r} & \frac{\sin \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ -\sin \theta \sin \phi & -\frac{\cos \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

Similarly, differentiating

H \equiv \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

with respect to r, \theta and \phi we get

\begin{bmatrix} \frac{\partial H}{\partial r}\\ \ \\ \frac{\partial H}{\partial \theta} \\ \ \\ \frac{\partial H}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \sin \phi\\ \ \\ \frac{\cos \theta \sin \phi}{r} \\ \ \\ \frac{\cos \phi}{r \sin \theta} \end{bmatrix}  + \begin{bmatrix} 0 & -\frac{\cos \theta \sin \phi}{r^2} & -\frac{\cos \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \sin \phi & -\frac{\sin \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

and differentiating

I \equiv \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \end{bmatrix}

with respect to r and \theta we get

\begin{bmatrix} \frac{\partial I}{\partial r}\\ \ \\ \frac{\partial I}{\partial \theta} \end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} \end{bmatrix} \begin{bmatrix} \cos \theta \\ \ \\ -\frac{\sin \theta}{r} \end{bmatrix}  + \begin{bmatrix} 0 & \frac{\sin \theta}{r^2}\\ \ \\ -\sin \theta & -\frac{\cos \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \end{bmatrix}

Therefore we have

\frac{\partial G}{\partial x} =  \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \cos \phi\\ \ \\ \frac{\cos \theta \cos \phi}{r} \\ \ \\ -\frac{\sin \phi}{r \sin \theta} \end{bmatrix}

+ \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} 0 & -\frac{\cos \theta \cos \phi}{r^2} & \frac{\sin \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \cos \phi & -\frac{\sin \theta \cos \phi}{r} & \frac{\sin \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ -\sin \theta \sin \phi & -\frac{\cos \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

and similarly

\frac{\partial H}{\partial y} =  \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \sin \phi\\ \ \\ \frac{\cos \theta \sin \phi}{r} \\ \ \\ \frac{\cos \phi}{r \sin \theta} \end{bmatrix}

+ \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} 0 & -\frac{\cos \theta \sin \phi}{r^2} & -\frac{\cos \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \sin \phi & -\frac{\sin \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}

and

\frac{\partial I}{\partial z} =  \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} \end{bmatrix} \begin{bmatrix} \cos \theta \\ \ \\ -\frac{\sin \theta}{r} \end{bmatrix}  + \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix}\begin{bmatrix} 0 & \frac{\sin \theta}{r^2}\\ \ \\ -\sin \theta & -\frac{\cos \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \end{bmatrix}

Laplace’s equation is given by \frac{\partial G}{\partial x} + \frac{\partial H}{\partial y} + \frac{\partial I}{\partial z} = 0 so we need to work out the three matrix equations above and sum them. The result will be a sum involving the nine partials \frac{\partial^F}{\partial r^2}, \frac{\partial^F}{\partial \theta^2}, \frac{\partial^F}{\partial \phi^2}, \frac{\partial^F}{\partial r \partial \theta}, \frac{\partial^F}{\partial r \partial \phi}, \frac{\partial^F}{\partial \theta \phi}, \frac{\partial F}{\partial r}, \frac{\partial F}{\partial \theta} and \frac{\partial F}{\partial \phi}. I found that the most convenient way to work out the sum was by working out the coefficients of each of these nine partials individually in turn. The result is

\frac{\partial G}{\partial x} + \frac{\partial H}{\partial y} + \frac{\partial I}{\partial z} =

\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} =

\big \{\sin^2 \theta \cos^2 \phi + \sin^2 \theta \sin^2 \phi + \cos^2 \theta \big \} \frac{\partial^2 F}{\partial r^2}

+ \big \{ \frac{\cos^2 \theta \cos^2 \phi}{r^2} + \frac{\cos^2 \theta \sin^2 \phi}{r^2} + \frac{\sin^2 \theta}{r^2} \big \} \frac{\partial^2 F}{\partial \theta^2}

+ \big \{ \frac{\sin^2 \phi}{r^2 \sin^2 \theta} + \frac{\cos^2 \phi}{r^2 \sin^2 \theta}\big \} \frac{\partial^2 F}{\partial \phi^2}

+ \big \{ \frac{\sin \theta \cos \theta \cos^2 \phi}{r} + \frac{\sin \theta \cos \theta \sin^2 \phi}{r} - \frac{\sin \theta \cos \theta}{r}\big \} \frac{\partial^2 F}{\partial r \partial \theta}

+ \big \{ -\frac{\cos \phi \sin \phi}{r} + \frac{\cos \phi \sin \phi}{r} \big \} \frac{\partial^2 F}{\partial r \partial \phi}

+ \big \{ -\frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} -\frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} + \frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} + \frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} \big \} \frac{\partial^2 F}{\partial \theta \partial \phi}

+ \big \{ \frac{\cos^2 \theta \cos^2 \phi}{r} + \frac{\sin^2 \phi}{r} + \frac{\cos^2 \theta \sin^2 \phi}{r} + \frac{\cos^2 \phi}{r} + \frac{\sin^2 \theta}{r}\big \} \frac{\partial F}{\partial r}

+ \big \{ -\frac{2\sin \theta \cos \theta \cos^2 \phi}{r^2} + \frac{\cos \theta \sin^2 \phi}{r^2 \sin \theta} -\frac{2\sin \theta \cos \theta \sin^2 \phi}{r^2} + \frac{\cos \theta \cos^2 \phi}{r^2 \sin \theta} + \frac{2\cos \theta \sin \theta}{r^2} \big \} \frac{\partial F}{\partial \theta}

+ \big \{ \frac{\cos \phi \sin \phi}{r^2} + \frac{2 \cos \phi \sin \phi}{r^2 \sin^2 \theta} - \frac{\cos \phi \sin \phi}{r^2} - \frac{2 \cos \phi \sin \phi}{r^2 \sin^2 \theta} \big \} \frac{\partial F}{\partial \phi}

which simplifies to

\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} =

\frac{\partial^2 F}{\partial r^2} + \frac{1}{r^2} \frac{\partial^2 F}{\partial \theta^2} + \frac{1}{r^2 \sin^2 \theta} \frac{\partial^2 F}{\partial \phi^2} + \frac{2}{r} \frac{\partial F}{\partial r} + \frac{\cos \theta}{r^2 \sin \theta} \frac{\partial F}{\partial \theta}

= \frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2}

Therefore Laplace’s equation in spherical polar coordinates is

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2} = 0

as before.

 

On an integral calculus approach to finding the area of a quadrarc

p01I recently gave a ‘stretch and challenge’ problem to some of my GCSE maths students which involved finding the area of a quadrarc. A quadrarc, shown shaded in the diagram, is the shape enclosed by four arcs having identical radius but with centres at the four corners of a square. Finding the formula for the area of a quadrarc is in principle just within the reach of a high-attaining GCSE maths student since it can be done using only plane geometry techniques covered in the later stages of a GCSE course. However, it is a challenging problem and although some of my students got quite close none of them managed to solve it fully. The same plane geometry techniques are covered more thoroughly in the foundation year of A-level maths courses so this approach should be more feasible for first-year A-level maths students.

When exploring alternative ways to solve the problem myself, I employed an integral calculus approach that struck me as also being potentially very pedagogically useful for A-level students. It involves a number of techniques from coordinate geometry and integration that are in principle within the reach of such students in the later stages of their studies, so the quadrarc problem is also eminently useful as a ‘stretch and challenge’ revision problem for high-attaining A-level maths students if it is specified that integral calculus is to be used (rather than plane geometry). I have not been able to find any integral calculus solutions to the quadrarc problem in the literature which bring out the potentially useful pedagogical features at A-level, so I want to record my solution in the present note emphasising these features. The reader should try to solve the problem of finding the area of a quadrarc for himself/herself before reading any further.

In what follows we will always take the main square within which the quadrarc lies to be of side length r. I will begin by quickly setting out a plane geometry solution which involves finding the areas of an equilateral triangle, a sector, and a segment (diagrams A, B and C below).

p02

Given that the main square has side length r, the area of the equilateral triangle in diagram A is (using the sine rule for area)

\frac{1}{2} r^2 \sin 60^{\circ} = \frac{\sqrt{3}}{4} r^2

The area of the sector in diagram B is

\frac{60^{\circ}}{360^{\circ}} \pi r^2 = \frac{\pi}{6} r^2

so the area of the segment shaded in diagram C is

\frac{\pi}{6} r^2 - \frac{\sqrt{3}}{4} r^2

Finally we obtain the shaded area in diagram D by deducting a triangle and two segments from the area of a quarter circle:

\frac{\pi}{4} r^2 - \frac{\sqrt{3}}{4} r^2 - 2\big( \frac{\pi}{6} r^2 - \frac{\sqrt{3}}{4} r^2 \big) = \frac{\sqrt{3}}{4} r^2 - \frac{\pi}{12} r^2

The area of the quadrarc is then the area of the square minus four of the areas in diagram D:

p07

 

Area of quadrarc =

r^2 - 4 \big( \frac{\sqrt{3}}{4} r^2 - \frac{\pi}{12} r^2 \big)

= \big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2

 

 

This same formula for the area of a quadrarc, \big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2, can be derived using integral calculus as follows.

p01

Let the origin be at the bottom left corner of the square with increasing x-direction to the right and increasing y-direction upwards. Based on the equation of a circle with centre at (r, 0) and radius r, the equation of the arc through R and P is

y_{RP} = \sqrt{r^2 - (x - r)^2}

Similarly, based on the equation of a circle with centre at (r, r), the equation of the arc through R and S is

y_{RS} = r - \sqrt{r^2 - (x - r)^2}

The x-coordinate of the point R is given by the intersection of y_{RP} and y_{RS}, i.e.,

r - \sqrt{r^2 - (x - r)^2} = \sqrt{r^2 - (x - r)^2}

\iff

4x^2 - 8xr + r^2 = 0

\implies

x = r \pm \frac{\sqrt{3}}{2}r

Therefore the x-coordinate of the point R is

r \big(1 - \frac{\sqrt{3}}{2} \big)

Note that we need to discard the other solution, namely r \big(1 + \frac{\sqrt{3}}{2} \big), because it exceeds r and therefore lies outside the main square. By symmetry, the x-coordinate of both the points P and S is \frac{r}{2}. Therefore half of the area of the quadrarc should be given by the integral

\int_{r(1 - \sqrt{3}/2)}^{r/2} (y_{RP} - y_{RS}) dx = \int_{r(1 - \sqrt{3}/2)}^{r/2} (2\sqrt{r^2 - (x - r)^2} - r) dx

= 2 \int_{r(1 - \sqrt{3}/2)}^{r/2} \sqrt{r^2 - (x - r)^2} dx - \frac{(\sqrt{3} - 1)}{2}r^2

From tables of standard integrals we have that

\int\sqrt{a^2 - z^2} dz = \frac{1}{2} z \sqrt{a^2 - z^2} + \frac{1}{2} a^2 \arctan\big( \frac{z}{\sqrt{a^2 - z^2}}\big)

Therefore let’s employ the change of variable z = x - r. Then dz = dx and we have z = -\frac{\sqrt{3}}{2}r when x = r\big(1 - \frac{3}{2}\big) whereas z = -\frac{r}{2} when x = \frac{r}{2}. Our integral becomes

2 \int_{- (\sqrt{3}/2)r}^{-r/2} \sqrt{r^2 - z^2} dz = \bigg[ z\sqrt{r^2 - z^2} + r^2 \arctan\big( \frac{z}{\sqrt{r^2 - z^2}}\big)\bigg]_{- (\sqrt{3}/2)r}^{-r/2}

= -\frac{r}{2}\sqrt{r^2 - \frac{r^2}{4}} + r^2 \arctan\bigg( \frac{-\frac{r}{2}}{\sqrt{r^2 - \frac{r^2}{4}}}\bigg) + \frac{\sqrt{3}}{2}r \sqrt{r^2 - \frac{3r^2}{4}} + r^2 \arctan\bigg( \frac{\frac{\sqrt{3}r}{2}}{\sqrt{r^2 - \frac{3r^2}{4}}}\bigg)

= r^2 \bigg\{ \arctan(\sqrt{3}) - \arctan\big( \frac{1}{\sqrt{3}}\big) \bigg\}

= \big\{ \frac{\pi}{3} - \frac{\pi}{6}\big\} r^2 = \frac{\pi}{6}r^2

Therefore half the area of the quadrarc is

\int_{r(1 - \sqrt{3}/2)}^{r/2} (y_{RP} - y_{RS}) dx

= 2 \int_{r(1 - \sqrt{3}/2)}^{r/2} \sqrt{r^2 - (x - r)^2} dx - \frac{(\sqrt{3} - 1)}{2}r^2

= \bigg(\frac{\pi}{6} - \frac{(\sqrt{3} - 1)}{2}\bigg)r^2

The full area of the quadrarc is then

\big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2

as before.

 

 

 

A note on Bragg’s law for X-ray scattering

wlbIn 1912, at the age of only 22, William L. Bragg wrote a paper which essentially began the field now known as X-ray crystallography. Earlier that year, Max von Laue and others had passed X-rays through crystals of copper sulphate and zinc sulphide, obtaining patterns of dots on a photographic film. These patterns were the first evidence that X-rays could be diffracted by the regularly arranged atoms in crystals. The X-ray diffraction pattern produced by zinc sulphide was particularly sharp and clear, and seemed to show a four-fold symmetry.
zns

Laue attempted to explain the patterns of dots mathematically but was not completely successful. He had tried to set up the maths based on overly complicated and partially incorrect models of what was going on at the atomic level. At this time the young William L. Bragg was just finishing a mathematics degree at Cambridge University. He heard about Laue’s work from his father, William H. Bragg, who was a mathematician and physicist himself and had been working on X-rays at the University of Leeds. When thinking about the problem, the young William Bragg envisaged a much simpler model than Laue’s in which the planes in the crystal lattice acted like mirrors. The incoming X-ray beams, interpreted as electromagnetic waves with a particular wavelength \lambda, would be reflected by the planes in the crystal lattice. The outgoing beams would then interfere with one another to give either constructive or destructive interference, depending on whether the crests of the waves were in phase or out of phase. The diagram below shows the standard set-up for deriving Bragg’s law that appears in every textbook and article on this topic.

scattering1

The incident beams are at an angle \theta to the planes in the lattice and are reflected at the same angle. The extra distance travelled by the lower beam is the sum of the lengths of the two black arrows in the diagram. Elementary trigonometry shows that the length of each black arrow is \text{d} \sin \theta, so the extra distance travelled by the lower beam is 2 \text{d} \sin \theta. A black dot is produced on the photographic film only if the extra distance travelled by the lower beam is a whole number of wavelengths, since in this case there will be constructive interference of the outgoing waves and the reinforced beam will register strongly. If the extra distance travelled by the lower beam is not exactly equal to a whole number of wavelengths, there will be some X-ray beam nearby that will interfere destructively with it and they will cancel each other out. For example, the diagram below shows how the beams cancel each other out when the extra distance travelled by the lower beam is an odd multiple of half a wavelength. In this case, the scattering of the X-ray beams will not be detectable and no black spot will appear in the corresponding position on the photographic film.

scattering2

It follows that it is possible to calculate the positions of the black spots on the photographic film using the equation

2 \text{d} \sin \theta = \text{n} \lambda

where n is an integer. This is Bragg’s law. A black spot appears on the photographic film only at those positions where the three quantities d, \theta and \lambda satisfy the equation of Bragg’s law simultaneously; otherwise no diffraction spot will appear on the film. This is the law that the young William Bragg published in his 1912 paper. Together, the two William Braggs continued to investigate crystal structures using X-ray diffraction, and father and son were both jointly awarded the Nobel Prize in physics for this work in 1915. The young William Bragg was only twenty-five and remains the youngest ever Nobel winner for science.

This is a lovely story and the standard textbook set-up above for deriving Bragg’s law seems straightforward enough. However, a bright A-level student of mine expressed dissatisfaction with it and asked if there is some alternative approach to deriving Bragg’s law that involves an explicit subtraction operation, i.e., an approach showing that a specific distance travelled by the lower X-ray beam minus a specific distance travelled by the upper X-ray beam equals 2 \text{d} \sin \theta. I felt this was an interesting question but when we tried to find an alternative approach in the literature we noticed that every textbook and every source online that we looked at had exactly this same set-up, or something very similar to it that the student also did not like! We eventually managed to come up with a calculation of the kind the student wanted to see by varying the standard textbook set-up slightly, and this is what I want to record in the present note.

We altered the standard set-up by moving the two short perpendiculars to the X-ray beams symmetrically away from the point of incidence of the upper beam, as indicated in the following diagram.

braggsdiagram1

A convenient place to put the blue perpendiculars is at the points of intersection of the lower X-ray beam with the upper crystal plane, which is what I have done in the diagram above. This then allows the type of explicit subtraction operation to be carried out that the student wanted to see. All we needed to do was to demonstrate that the distance shown in red in the diagram below, minus the distance shown in green, exactly equals 2 \text{d} \sin \theta.

braggsdiagram2From the diagram we see immediately that

\text{h} = \frac{\text{d}}{\sin \theta}

\frac{\text{j}}{\text{k}} = \cos \theta

\frac{\text{d}}{\text{k}} = \tan \theta

Therefore

\text{j} = \text{k} \cos \theta = \frac{\text{d}}{\tan \theta} \cos \theta

and the desired subtraction operation is then

2 \text{h} - 2 \text{j} = \frac{2 \text{d}}{\sin \theta} - \frac{2 \text{d}}{\tan \theta} \cos \theta = 2 \text{d} \bigg( \frac{1}{\sin \theta} - \frac{\cos^2 \theta}{\sin \theta} \bigg) = 2 \text{d} \sin \theta

as required. One can now imagine the blue perpendiculars sliding back together towards the centre of the diagram. The difference between the red and the green distances will always remain equal to 2 \text{d} \sin \theta as the blue perpendiculars slide inwards but the green distance will eventually shrink to zero. We will then have arrived back at the standard textbook set-up and the extra distance travelled by the lower X-ray beam will (of course) remain 2 \text{d} \sin \theta.

On passing from the discrete to the continuous form of Slutsky’s identity in microeconomics

slutskyI have noticed that there is a sharp ‘jump’ in the literature concerning Slutsky’s decomposition equation for the effects of a price change on demand. Elementary discussions usually employ a discrete version of Slutsky’s equation which assumes a relatively large price change. This is typically illustrated in a `pivot-shift’ type diagram which is familiar to all undergraduate microeconomics students. In more advanced (typically postgraduate) treatments, however, the discussion suddenly jumps to using a full blown partial differential equation form of Slutsky’s identity assuming an infinitesimal price change. The partial differential equation form is usually expressed in terms of a Hicksian demand function. I have not been able to find any appealing discussions of how the set-up that relates to the discrete form of Slutsky’s equation naturally evolves into the partial differential equation form as we pass from a large price change to the limit of an infinitesimally small price change. In this note I want to explore how the discrete form naturally evolves into the partial differential equation form when we make the price change infinitesimally small.

Slutsky’s decomposition equation expresses the effects of a price change on Marshallian demand in terms of a pure substitution effect, which is always negative, and an income effect, which can be positive or negative depending on whether the good is a normal good or an inferior good respectively. I will employ a simple two-good numerical example to illustrate the discrete form of Slutsky’s equation, using a Cobb-Douglas utility function of the form

u(x, y) = x^{1/2}y^{1/2}

The mathematical problem is to find the combination of x and y which maximises this utility function subject to the budget constraint

p_1x + p_2y = m

The Cobb-Douglas utility function is globally concave and smooth so we are guaranteed to find a unique interior solution by partial differentiation. One normally proceeds by taking the natural logarithm of the utility function (this is a monotonic transformation so does not affect the preferences represented by the original utility function) and setting up the Lagrangian for the problem, namely

L = \frac{1}{2}\ln x + \frac{1}{2}\ln y + \lambda (m - p_1x -p_2y)

Taking first-order partial derivatives with respect to x, y and \lambda and setting them equal to zero we get

\frac{\partial L}{\partial x}  = \frac{1}{2x} - \lambda p_1 = 0

\frac{\partial L}{\partial y} = \frac{1}{2y} - \lambda p_2 = 0

\frac{\partial L}{\partial \lambda} = m - p_1x - p_2y = 0

This is a system of three equations in three unknowns. Dividing the first equation by the second and rearranging one obtains

\frac{y}{x} = \frac{p_1}{p_2}

Solving the third equation for x we get

x = \frac{m}{p_1} - \frac{p_2}{p_1}y

and substituting this into the equation above we get

\frac{y}{\frac{m}{p_1} - \frac{p_2}{p_1}y} = \frac{p_1}{p_2}

\iff

y = \frac{m}{2p_2}

This is the uncompensated demand function (often also called the Marshallian demand function) for good y. By symmetry, the uncompensated demand function for good x is

x = \frac{m}{2p_1}

(Note that rearranging the demand function for y we get p_2y = \frac{m}{2} which says that the consumer will spend exactly half of the income on y, and similarly for x. Whenever the Cobb-Douglas utility function is in a form in which the exponents on the goods are fractions which sum to 1, these fractions tell us the proportions of income which will be spent on the corresponding goods. Our utility function was of the form u(x, y) = x^{1/2}y^{1/2} so one-half of the total income is spent on each good, as we confirmed with the above calculation).

To illustrate Slutsky’s decomposition of the effects of a price change into a pure substitution effect and an income effect, consider the above uncompensated demand function for x and suppose that m = \pounds 1000 while p_1 = \pounds 10. The amount of x demanded at this income and price is then

x(p_1, m) = \frac{1000}{(2)(10)}= 50

This corresponds to the amount of x in the bundle A in the diagram below.

slutskypricerise

Now suppose that the price rises to p_1^{*} = \pounds 20

The amount of x demanded at the original income and this new price falls to

x(p_1^{*}, m) = \frac{1000}{(2)(20)} = 25

This corresponds to the amount of x in the bundle C in the diagram.

Slutsky’s decomposition of this total change in demand begins by asking what change in income would be enough to enable the consumer to buy the original amount of x at the new price. This amount of additional income is obtained as

p_1^{*}x - p_1x = (20)(50) - (10)(50) = \pounds 500

Therefore, `compensating’ the consumer by increasing the income level from m = \pounds 1000 to m^{*} = \pounds 1500 enables them to buy their original bundle A with x = 50. This increase in the income level corresponds to a shift outwards in the new budget line to a position represented by the blue budget line in the diagram.

In the sense that the original bundle A is affordable again (so purchasing power has remained constant), the consumer is now as well off as before, but the original bundle A is no longer the utility-maximising one at the new price and the higher income level. The consumer will want to adjust the bundle until the utility function is maximised at the new price and new income. The amount of x the consumer will actually demand at the new price and new income level will be

x(p_1^{*}, m^{*}) = \frac{1500}{(2)(20)} = 37.5

This corresponds to the amount of x in the bundle B in the diagram above, and is usually referred to in the literature as the compensated demand for x (as opposed to the uncompensated demand at point A). The pure substitution effect of the price rise (i.e., abstracting from the income effect) is then the change in demand for x when the price of x changes to p_1^{*} and at the same time the income level changes to m^{*} to keep the consumer’s purchasing power constant:

x(p_1^{*}, m^{*}) - x(p_1, m) = 37.5 - 50 = -12.5

This is the change in the amount of x represented by the shift from bundle A to bundle B in the diagram above.

In this numerical example, the pure substitution effect of the price rise accounts for exactly half of the total drop in the demand for x from 50 at point A to 25 at point C. The other half of the drop in the demand for x is accounted for by the income effect (sometimes called the `wealth’ effect) of the price rise, which is represented in the diagram above by a parallel shift inwards of the blue budget line to the position of the final budget line on which bundle C lies. This is the change in demand for x when we change the income level from m^{*} back to m, holding the price of x fixed at p_1^{*}. Thus, the income effect is computed as

x(p_1^{*}, m) - x(p_1^{*}, m^{*}) = 25 - 37.5 = -12.5

The substitution effect plus the income effect together account for the full drop in the demand for x as a result of moving from bundle A to bundle C in response to the price rise of x.

In this simple numerical example Slutsky’s decomposition equation takes the discrete form

x(p_1^{*}, m) - x(p_1, m) = \big \{ x(p_1^{*}, m^{*}) - x(p_1, m) \big \} + \big\{ x(p_1^{*} , m) - x(p_1^{*}, m^{*}) \big \}

As we pass to the limit of an infinitesimally small price change, however, Slutsky’s decomposition equation takes the form of a partial differential equation which can be derived quite naturally from the discrete form by the following arguments. Suppose we start at point A in the above diagram again, and suppose the price changes by an infinitesimal amount \delta, i.e.,

p_1 \longrightarrow p_1 + \delta

The (infinitesimal) amount of income needed to compensate the consumer so that the bundle at A remains affordable after the price change is

(p_1 + \delta ) x - p_1 x = \delta x

We can then rewrite the discrete form of Slutsky’s equation as

x(p_1 + \delta, \ m) - x(p_1, \ m) =

\big \{ x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m) \big \} + \big \{x(p_1 + \delta, \ m) - x(p_1 + \delta, \ m + \delta x) \big \}

\iff

x(p_1 + \delta, \ m) - x(p_1, \ m) =

\big \{ x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m) \big \} - \big \{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m) \big \}

Dividing through by \delta and taking the limit as \delta \rightarrow 0 we get

\frac{\partial x}{\partial p_1} = \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} - \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta}

The second term on the right-hand side, the income effect, can be written as

\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta} = x \lim_{\delta x \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta x} = x \frac{\partial x}{\partial m}

Our partial differential equation form of Slutsky’s identity has then so far evolved to

\frac{\partial x}{\partial p_1} = \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} - x \frac{\partial x}{\partial m}

Now consider the first term on the right-hand side, namely

\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta}

This is the effect of the (infinitesimal) price change when we eliminate the income effect by changing the income to m + \delta x, i.e., it is the pure price substitution effect. What we are going to do is replace the two terms in the numerator by corresponding Hicksian demand functions, usually denoted by the letter h. Consider first the original demand function x(p_1, \ m) at point A in the diagram above. This was obtained by maximising utility subject to the price p_1 and income level m. However, we can also regard it as having been obtained by solving the dual problem of minimising the expenditure required to achieve the level of utility u(x^A, y^A) associated with the bundle A in the diagram above, given the price p_1. We would find that the minimised expenditure level would be

E(p_1, \ u(x^A, y^A)) = m

and the amount of x demanded would be the same as the original amount:

x(p_1, \ m) = h(p_1, \ E(p_1, \ u(x^A, y^A)))

Now consider the problem of minimising the expenditure required to achieve the level of utility u(x^A, y^A) given the changed price p_1 + \delta. The minimised expenditure level would be

E(p_1 + \delta, \ u(x^A, y^A))

Now, it is clearly the case that this minimised expenditure is `sandwiched’ between m and m + \delta x, i.e.,

m < E(p_1 + \delta, \ u(x^A, y^A)) < m + \delta x

so the difference between the three quantities in the inequality must become vanishingly small as \delta \rightarrow 0. Therefore as long as we are taking limits as \delta \rightarrow 0, we can replace x(p_1 + \delta, \ m + \delta x) with the Hicksian demand function

h(p_1 + \delta, \ E(p_1 + \delta, \ u(x^A, y^A)))

(What I am basically saying here is that as \delta \rightarrow 0, we can replace the point B in the diagram above with a point that is on the same indifference curve as the original bundle at point A. In this case the consumer would be compensated not by returning them to the original purchasing power after the price change, but to the original level of utility they were experiencing. As \delta becomes infinitesimally small, it makes no difference which of these two points we think about for the pure substitution effect. When the price change is large, these two points diverge and we then have a distinction between a Slutsky substitution effect which involves restoring the original purchasing power, and a Hicksian substitution effect which involves restoring the original utility level. The distinction between these two disappears when the price change is infinitesimal). We then have

\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} = \lim_{\delta \to 0} \frac{h(p_1 + \delta, \ E(p_1 + \delta, \ u(x^A, y^A))) - h(p_1, E(p_1, \ u(x^A, y^A)))}{\delta}

=

\frac{\partial h}{\partial p_1}

We now have the final partial differential equation form of Slutsky’s decomposition equation in the way it is usually written using the Hicksian demand function:

\frac{\partial x}{\partial p_1} = \frac{\partial h}{\partial p_1} - x \frac{\partial x}{\partial m}

As before, this says that the total effect of a price change is composed of a pure substitution effect (with income adjusted to exactly compensate the consumer for the wealth effect of the price change) and an income effect.

To check that this partial differential equation works in the context of our Cobb-Douglas example above, we can compute the partial derivatives explicitly. Since the demand function with the above Cobb-Douglas preferences would be

x = \frac{m}{2p_1}

we have

\frac{\partial x}{\partial p_1} = - \frac{m}{2(p_1)^2}

By solving a simple expenditure minimisation problem it can easily be shown that

h = \frac{(p_2)^{1/2}}{(p_1)^{1/2}} u

(The expenditure minimisation problem would be to minimise E = p_1 x + p_2 y subject to x^{1/2} y^{1/2} = u. Solving the constraint equation for y and substituting into the objective function gives an unconstrained minimisation problem in the variable x only. Solving this yields the above expression for the Hicksian demand function h).

Therefore we also have

\frac{\partial h}{\partial p_1} = -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1)^{3/2}} u

= -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1)^{3/2}}\big(\frac{m}{2p_1}\big)^{1/2}\big(\frac{m}{2p_2}\big)^{1/2}

= -\frac{m}{4(p_1)^2}

and

-\frac{\partial x}{\partial m}\cdot x = -\frac{1}{2p_1}\frac{m}{2p_1} = -\frac{m}{4(p_1)^2}

Putting these into the partial differential form of Slutsky’s equation we see that the equation is satisfied.