# Illustrating the correspondence between 1-forms and vectors using de Broglie waves

I was asked by a student to clarify the issues surrounding an exercise in the famous book Gravitation written by Misner, Thorne and Wheeler (MTW). The exercise appears as follows in Chapter 2, Section 5:

The key point of this section is that Equation (2.14), the defining equation of 1-forms, can be shown to be physically valid (as well as being a just a mathematical definition) using de Broglie waves in quantum mechanics. The notation in MTW is not ideal, so we will replace the notation $\langle \mathbf{\tilde{p}}, \mathbf{v} \rangle$ for a 1-form evaluated at a vector $\mathbf{v}$ by the notation $\mathbf{\tilde{p}}(\mathbf{v})$. What MTW are then saying is that given any vector $\mathbf{p}$ we can define a corresponding 1-form as

$\mathbf{\tilde{p}} = \langle \mathbf{p}, \ \rangle$

which is to be viewed as a function waiting for a vector input (to be placed in the empty space on the right-hand side of the angle brackets). When the vector input $\mathbf{v}$ is supplied, the 1-form will then yield the number

$\mathbf{\tilde{p}}(\mathbf{v}) = \langle \mathbf{p}, \mathbf{v} \rangle = \mathbf{p} \cdot \mathbf{v}$

In Exercise 2.1 we are asked to verify the validity of this equation using the de Broglie wave

$\psi = e^{i \phi} = exp[i(\mathbf{k}\cdot \mathbf{x}- \omega t)]$

The phase is the angular argument $\phi = \mathbf{k}\cdot \mathbf{x} - \omega t$ which specifies the position of the wave from some starting point. The phase is parameterised by the wave vector $\mathbf{k}$ which is such that $|\mathbf{k}| = 2 \pi/\lambda$ where $\lambda$ is the wavelength, and by the angular frequency $\omega = 2 \pi f$ where $f$ is the frequency of the relevant oscillator.

It is a well known fact (and it is easy to verify) that given any real-valued function of a vector $\phi(\mathbf{x})$, the gradient vector $\partial \phi/\partial \mathbf{x}$ is orthogonal to the level surfaces of $\phi$. In the case of the phase of a de Broglie wave we have

$\frac{\partial \phi}{\partial \mathbf{x}} = \mathbf{k}$

so the wave vector is the (position) gradient vector of the phase $\phi$ and therefore $\mathbf{k}$ must be orthogonal to loci of constant phase.

In the case of circular waves, for example, these loci of constant phase are circles with centre at the source of the waves and the wave vectors $\mathbf{k}$ point radially outwards at right angles to them, as indicated in the diagram.

To get a diagrammatic understanding of the relationship between 1-forms and vectors, we can imagine focusing on a very small neighbourhood around some point located among these loci of constant phase. On this very small scale, the loci of constant phase will look flat rather than circular, but the wave vectors $\mathbf{k}$ will still be orthogonal to them. What we do is interpret this local pattern of (flat) surfaces of constant phase as the 1-form $\mathbf{\tilde{k}}$This 1-form corresponding to the wave vector $\mathbf{k}$ is

$\mathbf{\tilde{k}} = \langle \mathbf{k}, \ \rangle$

and as before we interpret this as a function waiting for a vector input. When it receives a vector input, say $\mathbf{v}$, it will output a number computed as the scalar product of $\mathbf{k}$ and $\mathbf{v}$. Thus we can write

$\mathbf{\tilde{k}}(\mathbf{v}) = \langle \mathbf{k}, \mathbf{v} \rangle = \mathbf{k} \cdot \mathbf{v}$

As indicated in the diagram, the vector $\mathbf{v}$ which we supply to $\mathbf{\tilde{k}}$ will be at an angle to the wave vector $\mathbf{k}$. If the vector $\mathbf{v}$ is parallel to the loci of constant phase then $\mathbf{\tilde{k}}(\mathbf{v}) = 0$ because $\mathbf{k}$ and $\mathbf{v}$ will be orthogonal. In the language of 1-forms, this would be interpreted by saying that the vector $\mathbf{v}$ will not pierce the 1-form $\mathbf{\tilde{k}}$ because it will not cross any of the loci of constant phase. Conversely, if the vector $\mathbf{v}$ is parallel to the wave vector $\mathbf{k}$ (orthogonal to the loci of constant phase), we would say that $\mathbf{v}$ will pierce the 1-form $\mathbf{\tilde{k}}$ as much as possible, because it will cross as many loci of constant phase as it possibly can. Between these extremes we will get intermediate values of the 1-form $\mathbf{\tilde{k}}(\mathbf{v})$. The key idea, then, is that the set of loci of constant phase in the neighbourhood of a point is the diagrammatic representation of the 1-form $\mathbf{\tilde{k}}$. When we feed a vector $\mathbf{v}$ into this 1-form we get a measure $\mathbf{\tilde{k}}(\mathbf{v})$ of how many loci of constant phase the vector pierces. This is the language being used by MTW in the prelude to Exercise 2.1 above.

To actually solve Exercise 2.1, begin by recalling from quantum mechanics that a photon’s momentum $\mathbf{p}$ is such that $|\mathbf{p}| = E/c$ where $E = hf$ is the photonic energy and $f$ is the frequency of the oscillator. Since $\lambda f = c$ where $\lambda$ is the photon’s wavelength, we have $E = hc/\lambda$ so the magnitude of the photon’s momentum is

$|\mathbf{p}| = \frac{E}{c} = \frac{h}{\lambda} = \hbar \frac{2\pi}{\lambda} = \hbar |\mathbf{k}|$

and in fact

$\mathbf{p} = \hbar \mathbf{k}$

Note that therefore

$\lambda = \frac{h}{|\mathbf{p}|}$

Famously, de Broglie’s idea in his 1924 PhD thesis was that this wavelength formula applies not just to photons but also to massive particles such as electrons, for which the momentum $\mathbf{p}$ would be calculated as

$\mathbf{p} = m \mathbf{u}$

where $m$ is the mass of the particle and $\mathbf{u}$ is its four-velocity in Minkowski spacetime. Note that this four-velocity is such that $\mathbf{u}\cdot\mathbf{u} = -1$ (easily demonstrated using the $- +++$ metric of Minkowski spacetime).

Thus we have

$\mathbf{p} = m \mathbf{u} = \hbar \mathbf{k}$

so

$\mathbf{u} = \frac{\hbar}{m} \mathbf{k}$

In the prelude to Exercise 2.1, MTW say

relabel the surfaces of $\mathbf{\tilde{k}}$ by $\hbar \times phase$, thereby obtaining the momentum 1-form $\mathbf{\tilde{p}}$. Pierce this 1-form with any vector $\mathbf{v}$, and find the result that $\mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v})$.

Following the authors’ instructions, we relabel the surfaces of $\mathbf{\tilde{k}}$ (i.e., the loci of constant phase) by multiplying by $\hbar$ to get the 1-form

$\mathbf{\tilde{p}} = \hbar \mathbf{\tilde{k}} = \hbar \langle \mathbf{k}, \ \rangle$

As usual, this 1-form is a linear function waiting for a vector input. Supplying the input $\mathbf{v}$ we then get

$\mathbf{\tilde{p}}(\mathbf{v}) = \hbar \langle \mathbf{k}, \mathbf{v} \rangle = \hbar \mathbf{k} \cdot \mathbf{v}$

But this is exactly what we get when we work out $\mathbf{p} \cdot \mathbf{v}$ since

$\mathbf{p} \cdot \mathbf{v} = m \mathbf{u} \cdot \mathbf{v} = m \frac{\hbar}{m} \mathbf{k} \cdot \mathbf{v} = \hbar \mathbf{k} \cdot \mathbf{v}$

Thus, we have solved Exercise 2.1 by showing that $\mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v})$ is in accord with the quantum mechanical properties of de Broglie waves, as claimed by MTW.

# Notes on Sturm-Liouville theory

Sturm-Liouville theory was developed in the 19th century in the context of solving differential equations. When one studies it in depth, however, one experiences a sudden realisation that this is the mathematics underlying a lot of quantum mechanics. In quantum mechanics we envisage a quantum state (a time-dependent function) expressed as a superposition of eigenfunctions of a self-adjoint operator (usually referred to as a Hermitian operator) representing an observable. The coefficients of the eigenfunctions in this superposition are probability amplitudes. A measurement of the observable quantity represented by the Hermitian operator produces one of the eigenvalues of the operator with a probability equal to the square of the probability amplitude attached to the eigenfunction corresponding to that eigenvalue in the superposition. It is the fact that the operator is self-adjoint that ensures the eigenvalues are real (and thus observable), and furthermore, that the eigenfunctions corresponding to the eigenvalues form a complete and orthogonal set of functions enabling quantum states to be represented as a superposition in the first place (i.e., an eigenfunction expansion akin to a Fourier series). The Sturm-Liouville theory of the 19th century has essentially this same structure and in fact Sturm-Liouville eigenvalue problems are important more generally in mathematical physics precisely because they frequently arise in attempting to solve commonly-encountered partial differential equations (e.g., Poisson’s equation, the diffusion equation, the wave equation, etc.), particularly when the method of separation of variables is employed.

I want to get an overview of Sturm-Liouville theory in the present note and will begin by considering a nice discussion of a vibrating string problem in Courant & Hilbert’s classic text, Methods of Mathematical Physics (Volume I). Although the problem is simple and the treatment in Courant & Hilbert a bit terse, it (remarkably) brings up a lot of the key features of Sturm-Liouville theory which apply more generally in a wide variety of physics problems. I will then consider Sturm-Liouville theory in a more general setting emphasising the role of the Sturm-Liouville differential operator, and finally I will illustrate further the occurrence of Sturm-Liouville systems in physics by looking at the eigenvalue problems encountered when solving Schrödinger’s equation for the hydrogen atom.

On page 287 of Volume I, Courant & Hilbert write the following:

Equation (12) here is the one-dimensional wave equation

$\frac{\partial^2 u}{\partial x^2} = \mu^2 \frac{\partial^2 u}{\partial t^2}$

which (as usual) the authors are going to solve by using a separation of variables of the form

$u(x, t) = v(x) g(t)$

As Courant & Hilbert explain, the problem then involves finding the function $v(x)$ by solving the second-order homogeneous linear differential equation

$\frac{\partial^2 v}{\partial x^2} + \lambda v = 0$

subject to the boundary conditions

$v(0) = v(\pi) = 0$

Although not explicitly mentioned by Courant & Hilbert at this stage, equations (13) and (13a) in fact constitute a full blown Sturm-Liouville eigenvalue problem. Despite being very simple, this setup captures many of the typical features encountered in a wide variety of such problems in physics. It is instructive to explore the text underneath equation (13a):

Not all these requirements can be fulfilled for arbitrary values of the constant $\lambda$.

… the boundary conditions can be fulfilled if and only if $\lambda = n^2$ is the square of an integer $n$.

To clarify this, we can try to solve (13) and (13a) for the three possible cases: $\lambda < 0$, $\lambda = 0$ and $\lambda > 0$. Suppose first that $\lambda < 0$. Then $-\lambda > 0$ and the auxiliary equation for (13) is

$D^2 = - \lambda$

$\implies$

$D = \pm \sqrt{- \lambda}$

Thus, we can write the general solution of (13) in this case as

$v = \alpha e^{\sqrt{-\lambda} x} + \beta e^{-\sqrt{-\lambda} x} = A \mathrm{cosh} \big(\sqrt{-\lambda} x\big) + B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)$

where $A$ and $B$ are constants to be determined from the boundary conditions. From the boundary condition $v(0) = 0$ we conclude that $A = 0$ so the equation reduces to

$v = B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)$

But from the boundary condition $v(\pi) = 0$ we are forced to conclude that $B = 0$ since $\mathrm{sinh} \big(\sqrt{-\lambda} \pi\big) \neq 0$. Therefore there is only the trivial solution $v(x) = 0$ in the case $\lambda < 0$.

Next, suppose that $\lambda = 0$. Then equation (13) reduces to

$\frac{\mathrm{d}^2 v}{\mathrm{d} x^2} = 0$

$\implies$

$v = A + Bx$

From the boundary condition $v(0) = 0$ we must conclude that $A = 0$, and the boundary condition $v(\pi) = 0$ means we are also forced to conclude that $B = 0$. Thus, again, there is only the trivial solution $v(x) = 0$ in the case $\lambda = 0$.

We see that nontrivial solutions can only be obtained when $\lambda > 0$. In this case we have $-\lambda < 0$ and the auxiliary equation is

$D^2 = - \lambda$

$\implies$

$D = \pm i \sqrt{\lambda}$

Thus, we can write the general solution of (13) in this case as

$v = \alpha e^{i \sqrt{\lambda} x} + \beta e^{- i \sqrt{\lambda} x} = A \mathrm{cos} \big(\sqrt{\lambda} x\big) + B \mathrm{sin} \big(\sqrt{\lambda} x\big)$

where $A$ and $B$ are again to be determined from the boundary conditions. From the boundary condition $v(0) = 0$ we conclude that $A = 0$ so the equation reduces to

$v = B \mathrm{sin} \big(\sqrt{\lambda} x\big)$

But from the boundary condition $v(\pi) = 0$ we must conclude that, if $B \neq 0$, then we must have $\sqrt{\lambda} = n$ where $n = 1, 2, 3, \ldots$. Thus, we find that for each $n = 1, 2, 3, \ldots$, the eigenvalues of this Sturm-Liouville problem are $\lambda_n = n^2$, and the corresponding eigenfunctions are $v = B \mathrm{sin}\big(n x\big)$. The coefficient $B$ is undetermined and must be specified through some normalisation process, for example by setting the integral of $v^2$ between $0$ and $\pi$ equal to $1$ and then finding the value of $B$ that is consistent with this. In Courant & Hilbert they have (implicitly) simply set $B = 1$.

Some features of this solution are typical of Sturm-Liouville eigenvalue problems in physics more generally. For example, the eigenvalues are real (rather than complex) numbers, there is a minimum eigenvalue ($\lambda_1 = 1$) but not a maximum one, and for each eigenvalue there is a unique eigenfunction (up to a multiplicative constant). Also, importantly, the eigenfunctions here form a complete and orthogonal set of functions. Orthogonality refers to the fact that the integral of a product of any two distinct eigenfunctions over the interval $(0, \pi)$ is zero, i.e.,

$\int_0^{\pi} \mathrm{sin}(nx) \mathrm{sin}(mx) \mathrm{d} x = 0$

for $n \neq m$, as can easily be demonstrated in the same way as in the theory of Fourier series. Completeness refers to the fact that over the interval $(0, \pi)$ the infinite set of functions $\mathrm{sin} (nx)$, $n = 1, 2, 3, \ldots$, can be used to represent any sufficiently well behaved function $f(x)$ using a Fourier series of the form

$f(x) = \sum_{n=1}^{\infty} a_n \mathrm{sin} (nx)$

All of this is alluded to (without explicit explanation at this stage) in the subsequent part of this section of Courant & Hilbert’s text, where they go on to provide the general solution of the vibrating string problem. They write the following:

The properties of completeness and orthogonality of the eigenfunctions are again a typical feature of the solutions of Sturm-Liouville eigenvalue problems more generally, and this is one of the main reasons why Sturm-Liouville theory is so important to the solution of physical problems involving differential equations. To get a better understanding of this, I will now develop Sturm-Liouville theory in a more general setting by starting with a standard second-order homogeneous linear differential equation of the form

$\alpha(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \beta(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \gamma(x) y = 0$

where the variable $x$ is confined to an interval $a \leq x \leq b$.

Let

$p(x) = \mathrm{exp} \bigg(\int \mathrm{d} x \frac{\beta(x)}{\alpha(x)}\bigg)$

$q(x) = \frac{\gamma(x)}{\alpha(x)} p(x)$

Dividing the differential equation by $\alpha(x)$ and multiplying through by $p(x)$ we get

$p(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \frac{\beta(x)}{\alpha(x)} p(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \frac{\gamma(x)}{\alpha(x)} p(x) y = 0$

$\iff$

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} y}{\mathrm{d} x} \bigg) + q(x) y = 0$

$\iff$

$L y = 0$

where

$L = \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} }{\mathrm{d} x} \bigg) + q(x)$

is called the Sturm-Liouville differential operator. Thus, we see already that a wide variety of second-order differential equations encountered in physics will be able to be put into a form involving the operator $L$, so results concerning the properties of $L$ will have wide applicability.

Using the Sturm-Liouville operator we can now write the defining differential equation of Sturm-Liouville theory in an eigenvalue-eigenfunction format that is very reminiscent of the setup in quantum mechanics outlined at the start of this note. The defining differential equation is

$L \phi = - \lambda w \phi$

where $w(x)$ is a real-valued positive weight function and $\lambda$ is an eigenvalue corresponding to the eigenfunction $\phi$. This differential equation is often written out in full as

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0$

with $x \in [a, b]$. In Sturm-Liouville problems, the functions $p(x)$, $q(x)$ and $w(x)$ are specified at the start and, crucially, the function $\phi$ is required to satisfy particular boundary conditions at $a$ and $b$. The boundary conditions are a key aspect of each Sturm-Liouville problem; for a given form of the differential equation, different boundary conditions can produce very different problems. Solving a Sturm-Liouville problem involves finding the values of $\lambda$ for which there exist non-trivial solutions of the defining differential equation above subject to the specified boundary conditions. The vibrating string problem in Courant & Hilbert (discussed above) is a simple example. We obtain the differential equation (13) in that problem by setting $p(x) = 1$, $q(x) = 0$ and $w(x) = 1$ in the defining Sturm-Liouville differential equation.

We would now like to prove that the eigenvalues in a Sturm-Liouville problem will always be real and that the eigenfunctions will form an orthogonal set of functions, as claimed earlier. To do this, we need to consider a few more developments. In Sturm-Liouville theory we can apply $L$ to both real and complex functions, and a key role is played by the concept of the inner product of such functions. Using the notation $f(x)^{*}$ to denote the complex conjugate of the function $f(x)$, we define the inner product of two functions $f$ and $g$ over the interval $a \leq x \leq b$ as

$(f, g) = \int_a^b \mathrm{d} x f(x)^{*} g(x)$

and we define the weighted inner product as

$(f, g)_w = \int_a^b \mathrm{d} x w(x) f(x)^{*} g(x)$

where $w(x)$ is the real-valued positive weight function mentioned earlier. A key result in the theory is Lagrange’s identity, which says that for any two complex-valued functions of a real variable $u(x)$ and $v(x)$, we have

$v(Lu)^{*} - u^{*} Lv = \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]$

This follows from the form of $L$, since

$v(Lu)^{*} - u^{*} Lv = v\bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + q(x) u^{*}\bigg] - u^{*} \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) + q(x) v\bigg]$

$= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)$

$= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + \frac{\mathrm{d} v}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) - \frac{\mathrm{d} u^{*}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)$

$= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) v \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) u^{*} \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)$

$= \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]$

Using the inner product notation, we can write Lagrange’s identity in an alternative form that reveals the crucial role played by the boundary conditions in a Sturm-Liouville problem. We have

$(Lu, v) - (u, Lv) = \int_a^b (Lu)^{*} v \mathrm{d} x - \int_a^b u^{*} Lv \mathrm{d} x$

$= \int_a^b \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg] \mathrm{d} x$

$= \int_a^b \mathrm{d} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]$

$= \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b$

For some boundary conditions the final term here is zero and then we will have

$(Lu, v) = (u, Lv)$

When this happens, the operator in conjunction with the boundary conditions is said to be self-adjoint. As an example, a so-called regular Sturm-Liouville problem involves solving the differential equation

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0$

subject to what are called separated boundary conditions, taking the form

$A_1 \phi(a) + A_2 \phi^{\prime}(a) = 0$

and

$B_1 \phi(b) + B_2 \phi^{\prime}(b) = 0$

In this case, the operator $L$ is self-adjoint. To see this, suppose the functions $u$ and $v$ satisfy these boundary conditions. Then at $a$ we have

$A_1 u(a)^{*} + A_2 u^{\prime}(a)^{*} = 0$

and

$A_1 v(a) + A_2 v^{\prime}(a) = 0$

from which we can deduce that

$\frac{u^{\prime}(a)^{*}}{u(a)^{*}} = -\frac{A_1}{A_2} = \frac{v^{\prime}(a)}{v(a)}$

$\implies$

$v(a) u^{\prime}(a)^{*} = u(a)^{*} v^{\prime}(a)$

Similarly, at the boundary point $b$ we find that

$v(b) u^{\prime}(b)^{*} = u(b)^{*} v^{\prime}(b)$

These results then imply

$\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0$

so the operator $L$ is self-adjoint as claimed. As another example, a singular Sturm-Liouville problem involves solving the same differential equation as in the regular problem, but subject to the boundary condition that $p(x)$ is zero at either $a$ or $b$ or both, while being positive for $a < x < b$. If $p(x)$ does not vanish at one of the boundary points, then $\phi$ is required to satisfy the same boundary condition at that point as in the regular problem. Clearly we will have

$\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0$

in this case too, so the operator $L$ will also be self-adjoint in the case of a singular Sturm-Liouville problem. As a final example, suppose the Sturm-Liouville problem involves solving the same differential equation as before, but with periodic boundary conditions of the form

$\phi(a) = \phi(b)$

$\phi^{\prime}(a) = \phi^{\prime}(b)$

and

$p(a) = p(b)$

Then if $u$ and $v$ are two functions satisfying these boundary conditions we will have

$\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b$

$= p(b) \bigg(v(b) u^{\prime}(b)^{*} - u(b)^{*} v^{\prime}(b)\bigg) - p(a) \bigg(v(a) u^{\prime}(a)^{*} - u(a)^{*} v^{\prime}(a)\bigg)$

$= p(a) \bigg[\bigg(v(b) u^{\prime}(b)^{*} - v(a) u^{\prime}(a)^{*}\bigg) + \bigg(u(a)^{*} v^{\prime}(a) - u(b)^{*} v^{\prime}(b)\bigg)\bigg] = 0$

So again, the operator $L$ will be self-adjoint in the case of periodic boundary conditions. We will see later that the singular and periodic cases arise when attempting to solve Schrödinger’s equation for the hydrogen atom.

The key reason for focusing so much on the self-adjoint property of the operator $L$ is that the eigenvalues of a self-adjoint operator are always real, and the eigenfunctions are orthogonal. Note that by orthogonality of the eigenfunctions in the more general context we mean that

$(\phi_n, \phi_m)_w = \int_a^b \mathrm{d} x w(x) \phi_n(x)^{*} \phi_m(x) = 0$

whenever $\phi_n(x)$ and $\phi_m(x)$ are eigenfunctions corresponding to two distinct eigenvalues.

To prove that the eigenvalues are always real, suppose that $\phi(x)$ is an eigenfunction corresponding to an eigenvalue $\lambda$. Then we have

$L \phi = - \lambda w \phi$

and so

$(L \phi, \phi) = (- \lambda w \phi, \phi) = \int_a^b (- \lambda w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*} \int_a^b (w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*}\int_a^b \mathrm{d}x w(x)|\phi(x)|^2$

But we also have

$(\phi, L \phi) = (\phi, - \lambda w \phi) = \int_a^b \phi^{*}(- \lambda w \phi) \mathrm{d} x = -\lambda \int_a^b \phi^{*} (w \phi) \mathrm{d} x = -\lambda\int_a^b \mathrm{d}x w(x)|\phi(x)|^2$

Therefore if the operator is self-adjoint we can write

$(L \phi, \phi) - (\phi, L \phi) = (\lambda - \lambda^{*}) \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 = 0$

$\implies$

$\lambda = \lambda^{*}$

since $\int_a^b \mathrm{d}x w(x)|\phi(x)|^2 > 0$, so the eigenvalues must be real. In particular, this must be the case for regular and singular Sturm-Liouville problems, and for Sturm-Liouville problems involving periodic boundary conditions.

To prove that the eigenfunctions are orthogonal, let $\phi(x)$ and $\psi(x)$ denote two eigenfunctions corresponding to distinct eigenvalues $\lambda$ and $\mu$ respectively. Then we have

$L \phi = - \lambda w \phi$

$L \psi = - \mu w \psi$

and so by the self-adjoint property we can write

$(L \phi, \psi) - (\phi, L \psi) = \int_a^b (- \lambda w \phi)^{*} \psi \mathrm{d} x - \int_a^b \phi^{*} (- \mu w \psi) \mathrm{d} x$

$= (\mu - \lambda) \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0$

Since the eigenvalues are distinct, the only way this can happen is if

$(\phi, \psi)_w = \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0$

so the eigenfunctions must be orthogonal as claimed.

In addition to being orthogonal, the eigenfunctions $\phi_n(x)$, $n = 1, 2, 3, \dots$, of a Sturm-Liouville problem with specified boundary conditions also form a complete set of functions (I will not prove this here), which means that any sufficiently well-behaved function $f(x)$ for which $\int_a^b\mathrm{d} x |f(x)|^2$ exists can be represented by a Fourier series of the form

$f(x) = \sum_{n=1}^{\infty} a_n \phi_n(x)$

for $x \in [a, b]$, where the coefficients $a_n$ are given by the formula

$a_n = \frac{(\phi_n, f)_w}{(\phi_n, \phi_n)_w} = \frac{\int_a^b \mathrm{d}x w(x) \phi_n(x)^{*} f(x)}{\int_a^b \mathrm{d}x w(x) |\phi_n(x)|^2}$

It is the completeness and orthogonality of the eigenfunctions that makes Sturm-Liouville theory so useful in solving linear differential equations, because (for example) it means that the solutions of many second-order inhomogeneous linear differential equations of the form

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)$

with suitable boundary conditions can be expressed as a linear combination of the eigenfunctions of the corresponding Sturm-Liouville problem

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0$

with the same boundary conditions. To illustrate this, suppose this Sturm-Liouville problem with boundary conditions $\phi(a) = \phi(b) = 0$ has an infinite set of eigenvalues $\lambda_k$ and corresponding eigenfunctions $\phi_k(x)$, $k = 1, 2, 3, \dots$, which are orthogonal and form a complete set. We will assume that the solution of the inhomogeneous differential equation above is an infinite series of the form

$\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)$

where the coefficients $a_k$ are constants, and we will find these coefficients using the orthogonality of the eigenfunctions. Since for each $k$ it is true that

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k = - \lambda_k w(x) \phi_k$

we can write

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q \phi$

$= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \sum_{k = 1}^{\infty} a_k \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \sum_{k=1}^{\infty} a_k \phi_k$

$= \sum_{k=1}^{\infty} a_k \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k\bigg]$

$= \sum_{k=1}^{\infty} a_k\big[- \lambda_k w(x) \phi_k\big]$

$= - \sum_{k=1}^{\infty} a_k \lambda_k w(x) \phi_k$

Thus, in the inhomogeneous equation

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)$

we can put

$F(u) = - \sum_{k=1}^{\infty} a_k \lambda_k w(u) \phi_k(u)$

To find the $m$th coefficient $a_m$ we can multiply both sides by $\phi_m(u)^{*}$ and integrate. By orthogonality, all the terms in the sum on the right will vanish except the one involving $\phi_m(u)$. We will get

$\int_a^b \phi_m(u)^{*} F(u) \mathrm{d}u = - \int_a^b a_m \lambda_m w(x) \phi_m(u)^{*}\phi_m(u) \mathrm{d} u = -a_m \lambda_m (\phi_m, \phi_m)_w$

$\implies$

$a_m = -\int_a^b \frac{\phi_m(u)^{*} F(u)}{\lambda_m (\phi_m, \phi_m)_w}\mathrm{d} u$

Having found a formula for the coefficients $a_k$, we can now write the solution of the original inhomogeneous differential equation as

$\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)$

$= \sum_{k = 1}^{\infty} \bigg(-\int_a^b \frac{\phi_k(u)^{*} F(u)}{\lambda_k (\phi_k, \phi_k)_w}\mathrm{d} u\bigg) \phi_k(x)$

$= \int_a^b \mathrm{d} u \bigg(-\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}\bigg) F(u)$

$= \int_a^b \mathrm{d} u G(x, u) F(u)$

where

$G(x, u) \equiv -\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}$

To conclude this note, I want to go back to a previous note in which I explored in detail the solution of Schrödinger’s equation for the hydrogen atom by the method of separation of variables. This approach reduced Schrödinger’s partial differential equation into a set of three uncoupled ordinary differential equations which we can now see are in fact Sturm-Liouville problems. As discussed in my previous note, Schrödinger’s three-dimensional equation for the hydrogen atom can be written in spherical polar coordinates as

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

and after solving this by the usual separation of variables approach starting from the assumption that the $\psi$ function can be expressed as a product

$\psi(r, \theta, \phi) = R(r) \Phi(\phi) \Theta(\theta)$

we end up with an equation for $R$ (the radial equation) of the form

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

and equations for $\Phi$ and $\Theta$ of the forms

$\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0$

and

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0$

respectively. Taking each of these in turn, we first observe that the radial equation is of the Sturm-Liouville form with $p(r) = r^2$ and eigenvalues corresponding to the energy term $E$ in the equation. The variable $r$ can range between $0$ and $\infty$ and the boundary conditions are formulated in such a way that the solutions of the radial equation remain bounded as $r \rightarrow 0$ and go to zero as $r \rightarrow \infty$. Furthermore, since $p(0) =0$, the radial equation is a singular Sturm-Liouville problem. Next, we observe that the equation for $\Phi$ is essentially the same as equation (13) for the vibrating string in the extract from Courant & Hilbert discussed at the start of this note. The azimuth angle $\phi$ can take any value in $(-\infty, \infty)$ but the function $\Phi$ must take a single value at each point in space (since this is a required property of the quantum wave function which $\Phi$ is a constituent of). It follows that the function $\Phi$ must be periodic since it must take the same value at $\phi$ and $\phi + 2\pi$ for any given $\phi$. This condition implies the conditions $\Phi(0) = \Phi(2 \pi)$ and $\Phi^{\prime}(0) = \Phi^{\prime}(2\pi)$. Furthermore, we have $p(\phi) = 1$ for all $\phi$. Thus, the equation for $\Phi$ is a Sturm-Liouville problem with periodic boundary conditions. Finally, as discussed in my previous note, the $\Theta$ equation can be rewritten as

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

$\iff$

$\frac{d}{d x}\bigg((1 - x^2) \frac{d \Theta}{d x}\bigg) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

where $x = \cos \theta$ and thus $-1 \leq x \leq 1$. This is a Sturm-Liouville problem with $p(x) = 1 - x^2$ and the boundary conditions are given by the requirement that $\Theta(\theta)$ should remain bounded for all $x$. Since $p(x) = 0$ at both ends of the interval $[-1, 1]$, this equation can be classified as a singular Sturm-Liouville problem. The eigenvalue is $\lambda$ in this equation.

# Simple variational setups yielding Newton’s Second Law and Schrödinger’s equation

It is a delightful fact that one can get both the fundamental equation of classical mechanics (Newton’s Second Law) and the fundamental equation of quantum mechanics (Schrödinger’s equation) by solving very simple variational problems based on the familiar conservation of mechanical energy equation

$K + U = E$

In the present note I want to briefly set these out emphasising the common underlying structure provided by the conservation of mechanical energy and the calculus of variations. The kinetic energy $K$ will be taken to be

$K = \frac{1}{2}m \dot{x}^2 = \frac{p^2}{2m}$

where $\dot{x} = \frac{\mathrm{d}x}{\mathrm{d}t}$ is the particle’s velocity, $p = m\dot{x}$ is its momentum, and $m$ is its mass. The potential energy $U$ will be regarded as some function of $x$ only.

To obtain Newton’s Second Law we find the stationary path followed by the particle with respect to the functional

$S[x] = \int_{t_1}^{t_2} L(t, x, \dot{x}) dt = \int_{t_1}^{t_2} (K - U) dt$

The function $L(t, x, \dot{x}) = K - U$ is usually termed the Lagrangian’ in classical mechanics. The functional $S[x]$ is usually called the action’. The Euler-Lagrange equation for this calculus of variations problem is

$\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\big(\frac{\partial L}{\partial \dot{x}}\big) = 0$

and this is Newton’s Second Law in disguise! We have

$\frac{\partial L}{\partial x} = -\frac{\mathrm{d}U}{\mathrm{d}x} \equiv F$

$\frac{\partial L}{\partial \dot{x}} = m\dot{x} \equiv p$

and

$\frac{\mathrm{d}}{\mathrm{d}t} \big(\frac{\partial L}{\partial \dot{x}}\big) = \frac{\mathrm{d}p}{\mathrm{d}t} = m\ddot{x} \equiv ma$

so substituting these into the Euler-Lagrange equation we get Newton’s Second Law, $F = ma$.

To obtain Schrödinger’s equation we introduce a function

$\psi(x) = exp\big(\frac{1}{\hbar}\int p\mathrm{d}x\big)$

where $p = m \dot{x}$ is again the momentum of the particle and $\hbar$ is the reduced Planck’s constant from quantum mechanics. (Note that $\int p dx$ has units of length$^2$ mass time$^{-1}$ so we need to remove these by dividing by $\hbar$ which has the same units. The function $\psi(x)$ in quantum mechanics is dimensionless). We then have

$\text{ln} \psi = \frac{1}{\hbar}\int p\mathrm{d}x$

and differentiating both sides gives

$\frac{\psi^{\prime}}{\psi} = \frac{1}{\hbar} p$

so

$p^2 = \hbar^2 \big(\frac{\psi^{\prime}}{\psi}\big)^2$

Therefore we can write the kinetic energy as

$K = \frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2$

and putting this into the conservation of mechanical energy equation gives

$\frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2 + U = E$

$\iff$

$\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2 = 0$

We now find the stationary path followed by the particle with respect to the functional

$T[\psi] = \int_{-\infty}^{\infty} M(x, \psi, \psi^{\prime}) \mathrm{d}x = \int_{-\infty}^{\infty} \big(\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2\big)\mathrm{d}x$

The Euler-Lagrange equation for this calculus of variations problem is

$\frac{\partial M}{\partial \psi} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial M}{\partial \psi^{\prime}}\big) = 0$

and this is Schrödinger’s equation in disguise! We have

$\frac{\partial M}{\partial \psi} = 2(U - E)\psi$

$\frac{\partial M}{\partial \psi^{\prime}} = \frac{\hbar^2}{m} \psi^{\prime}$

and

$\frac{\mathrm{d}}{\mathrm{d}x} \big(\frac{\partial M}{\partial \psi^{\prime}}\big) = \frac{\hbar^2}{m} \psi^{\prime \prime}$

so substituting these into the Euler-Lagrange equation we get

$2(U - E) \psi - \frac{\hbar^2}{m} \psi^{\prime \prime} = 0$

$\iff$

$-\frac{\hbar^2}{2m} \frac{\mathrm{d}^2 \psi}{\mathrm{d} x^2} + U \psi = E \psi$

and this is the (time-independent) Schrödinger equation for a particle of mass $m$ with fixed total energy $E$ in a potential $U(x)$ on the line $-\infty < x < \infty$.

# A fluid-mechanical visualisation of the quantum-mechanical continuity equation

The concept of a probability current is useful in quantum mechanics for analysing quantum scattering and tunnelling phenomena, among other things. However, I have noticed that the same rather abstract and non-visual approach to introducing probability currents is repeated almost verbatim in every textbook (also see, e.g., this Wikipedia article). The standard approach essentially involves defining a probability current from the outset as

$\vec{j} = \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

and then using Schrödinger’s equation to show that this satisfies a fluid-like continuity equation of the form

$\frac{\partial \rho}{\partial t} + \nabla \cdot \vec{j} = 0$

with

$\rho \equiv \Psi^{*} \Psi$

In the present note I want to briefly explore a more intuitive and visual approach involving a model of the actual flow of a probability fluid’. I want to begin with a fluid-mechanical model and then obtain the standard expression for the quantum-mechanical continuity equation from this, rather than starting out with an abstract definition of the probability current and then showing that this satisfies a continuity equation. The essential problem one faces when trying to do this is that although in classical mechanics the position $\vec{r}(t)$ of a point particle and its velocity $\vec{v}(t) = d\vec{r}(t)/dt$ are well defined, this is not the case in conventional quantum mechanics. Quantum mechanics is done probabilistically, treating a particle as a wave packet such that the square of the amplitude of the corresponding wave function acts as a probability density which can be used to measure the probability that the particle will occupy a particular region of space at a particular time. It is not possible to say definitively where a particular particle will be at a particular time in quantum mechanics, which makes it difficult to apply the conventional deterministic equations of fluid mechanics.

A huge specialist literature on quantum hydrodynamics has in fact arisen which tries to circumvent this problem in a number of ways. A standard reference is Wyatt, R. E., 2005, Quantum Dynamics with Trajectories: Introduction to Quantum Hydrodynamics (Springer). The route that a large part of this literature has taken is intriguing because it is based on Bohmian mechanics, an approach to quantum mechanics developed by David Bohm in 1952 which is regarded by most mainstream physicists today as unconventional. The key feature of the Bohmian mechanics approach is that classical-like particle trajectories are possible. Using this approach one can obtain Newtonian-like equations of motion analogous to those in conventional fluid mechanics and this is how this particular literature seems to have chosen to treat quantum particle trajectories in a fluid-like way. Attempts have also been made to introduce mathematically equivalent approaches, but defined within conventional quantum mechanics (see, e.g., Brandt, S. et al, 1998, Quantile motion and tunneling, Physics Letters A, Volume 249, Issue 4, pp. 265-270).

In the present note I am not looking to solve any elaborate problems so I will simply consider a free quantum wave packet which is not acted upon by any forces and try to visualise probability currents and the quantum continuity equation in a fluid-like way by using the momentum vector operator $\hat{\vec{p}}$ to characterise the velocity of the particle. I will then show that the probability current obtained in this fluid-mechanical model is the same as the one defined abstractly in textbooks.

In quantum mechanics, calculations are done using operators to represent observables. Every possible observable that one might be interested in for the purposes of experiment has a corresponding operator which the mathematics of quantum mechanics can work on to produce predictions. The key operator for the purposes of the present note is the momentum vector operator

$\hat{\vec{p}} = -i \hbar \nabla$

which is the quantum mechanical analogue of the classical momentum vector

$\vec{p} = m \vec{v}$

The key idea for the present note is to regard the velocity vector of the quantum particle as being represented by the operator

$\frac{\hat{\vec{p}}}{m}$

by analogy with the classical velocity vector which can be obtained as

$\vec{v} = \frac{\vec{p}}{m}$

We will imagine the total probability mass

$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \Psi \text{d}V = 1$

as a fluid in steady flow throughout the whole of space and obeying mass conservation. The fact that the flow is steady reflects the fact that there are no forces acting on the quantum particle in this model, so we must have

$\frac{\partial }{\partial t}\big[\frac{\hat{\vec{p}}}{m}\big] = 0$

The velocity can vary from point to point in the probability fluid but at any given point it cannot be varying over time.

In a classical fluid we have a mass density per unit volume $\rho$ and we regard the velocity vector $\vec{v}$ as a volumetric flow rate per unit area, i.e., the volume of fluid that would pass through a unit area per unit time. Then $\rho \vec{v}$ is the mass flow rate per unit area, i.e., the mass of fluid that would pass through a unit area per unit time. In quantum mechanics we can regard the probability mass density per unit volume $\rho \equiv \Psi^{*} \Psi$ as analogous to the mass density of a classical fluid. We can interpret $\frac{\hat{\vec{p}}}{m}$ as the volumetric flow rate per unit area, i.e., the volume of probability fluid that would pass through a unit area per unit time. When doing probability calculations with quantum mechanical operators we usually sandwich’ the operator between $\Psi^{*}$ and $\Psi$, so following that approach here we can define the probability current density as

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$

This is to be interpreted as the probability mass flow rate per unit area, i.e., the amount of probability mass that would pass through a unit area per unit time, analogous to $\vec{j} = \rho \vec{v}$ in the classical case. To see how close the analogy is, suppose the quantum wave function is that of a plane wave

$\Psi(x, y, z, t) = A\mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

Then

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi = A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} \nabla A \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} A \ i \ \vec{k} \ \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A^2 \frac{\hbar \vec{k}}{m}$

$= \rho \vec{v}$

which looks just like the mass flow rate in the classical case with $\rho = \Psi^{*} \Psi$ and $\vec{v} \equiv \frac{\hbar \vec{k}}{m}$. Note that in this example the probability current density formula we are using, namely $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$, turned out to be real-valued. Unfortunately this will not always be the case. Since the probability current vector must always be real-valued, the fluid-mechanical model in the present note will only be applicable in cases when this is true for the formula $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$.

As in classical fluid mechanics, a continuity equation can now be derived by considering the net outflow of probability mass from an infinitesimal fluid element of volume $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$.

Considering only the $y$-component for the moment, we see from the diagram that on the left-hand side we have the probability mass flow rate coming into the volume element through the left-hand face. The mass flow rate coming out of the fluid element through the right-hand face can be approximated using a Taylor series expansion as being equal to the mass flow rate through the left-hand face plus a differential adjustment based on the gradient of the probability current density and the length $\mathrm{d} y$. The net probability mass flow rate in the $y$-direction is then obtained by subtracting the left-hand term from the right-hand term to get

$\frac{\partial }{\partial y} \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi\big) \mathrm{d}V$

Using similar arguments for the $x$ and $z$-directions, the net mass flow rate out of the fluid element in all three directions is then

$\nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) \mathrm{d} V$

Now, the probability mass inside the fluid element is $\rho \mathrm{d} V$ where $\rho = \Psi^{*} \Psi$ and if there is a net outflow of probability fluid this mass will be decreasing at the rate

$- \frac{\partial \rho}{\partial t} \mathrm{d} V$

Equating the two expressions and dividing through by the volume of the fluid element we get the equation of continuity

$\frac{\partial \rho}{\partial t} + \nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) = 0$

What I want to do now is show that if we work out $\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$ we will get the same formula for the probability current as the one usually given in quantum mechanics textbooks. We have

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{m} \Psi^{*} \nabla \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi$

We now note that since the probability current density must be a real vector, the last two terms above must be real. Therefore they are not affected in any way if we take their complex conjugate. Taking the complex conjugate of the second term in the last equality we get

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi + \frac{i \hbar}{2m} \Psi \nabla \Psi^{*}$

$= \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

$= \vec{j}$

This is exactly the expression for the probability current density that appears in textbooks, but rather than introducing it out of nowhere’ at the beginning, we have obtained it naturally as a result of a fluid-mechanical model.

# Calculation of a quantum-mechanical commutator in three dimensions

I needed to work out the commutator $[\hat{H}, \hat{\vec{r}} \ ]$, where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator and $\hat{\vec{r}}$ is the 3D position vector operator. It is difficult to find any textbook or online source that explicitly goes through the calculation of this three-dimensional case (in fact, I have not been able to find any) so I am recording my calculation step-by-step in this note.

The commutator $[\hat{H}, \hat{\vec{r}} \ ]$ is a vector operator with components

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H}$

$[\hat{H}, \hat{y} \ ] = \hat{H} \ \hat{y} - \hat{y} \ \hat{H}$

and

$[\hat{H}, \hat{z} \ ] = \hat{H} \ \hat{z} - \hat{z} \ \hat{H}$

To evaluate these, note that the momentum operator (in position space) is

$\hat{\vec{p}} = -i \ \hbar \nabla$

and so we have

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

$= \frac{1}{2m} \hat{\vec{p}} \cdot \hat{\vec{p}} + \hat{U}$

$= \frac{1}{2m}(\hat{p}_x^{2} + \hat{p}_y^{2} + \hat{p}_z^{2}) + \hat{U}$

Looking at the $x$-component of $[\hat{H}, \hat{\vec{r}} \ ]$ we therefore have

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{\hat{p}_x^{2} \hat{x} + \hat{p}_y^{2} \hat{x} + \hat{p}_z^{2} \hat{x}}{2m} + \hat{U} \ \hat{x} - \big(\frac{\hat{x} \hat{p}_x^{2} + \hat{x} \hat{p}_y^{2} + \hat{x} \hat{p}_z^{2}}{2m} + \hat{x} \ \hat{U}\big)$

$= \frac{1}{2m}([\hat{p}_x^2, \hat{x} \ ] + [\hat{p}_y^2, \hat{x} \ ] + [\hat{p}_z^2, \hat{x} \ ]) + [\hat{U}, \hat{x} \ ]$

Since multiplication is commutative we have $[\hat{U}, \hat{x} \ ] = 0$. I will now show that we also have

$[\hat{p}_y^2, \hat{x} \ ] = [\hat{p}_z^2, \hat{x} \ ] = 0$

To see this, let us first work out in detail the effect of $[\hat{p}_y, \hat{x} \ ]$ on a wavefunction $\Psi$. We have

$[\hat{p}_y, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial y} + \hat{x }i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial y} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} = 0$

where the last equality follows from the fact that $\hat{x}$ does not depend on $y$. Thus, $[\hat{p}_y, \hat{x} \ ] = 0$.

We can now easily show that $[\hat{p}_y^2, \hat{x} \ ] = 0$ because using the basic result for commutators that

$[AB, C] = A[B, C] + [A, C]B$

(easy to prove by writing out the terms in full) we find that

$[\hat{p}_y^2, \hat{x} \ ] = \hat{p}_y \ [\hat{p}_y, \hat{x} \ ] + [\hat{p}_y, \hat{x} \ ] \ \hat{p}_y = 0$

Identical arguments show that $[\hat{p}_z^2, \hat{x} \ ] = 0$. Thus, we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ]$

It now only remains to work out $[\hat{p}_x^2, \hat{x} \ ]$ and we can do this by first working out in detail the effect of $[\hat{p}_x, \hat{x} \ ]$ on a wavefunction $\Psi$ (this is of course the canonical commutation relation’ of quantum mechanics). We have

$[\hat{p}_x, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial x} + \hat{x } i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial x} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} = - i \ \hbar \Psi$

where the last equality follows from the fact that $\hat{x}$ is the same as multiplying by $x$ so its derivative with respect to $x$ equals $1$. Thus, $[\hat{p}_x, \hat{x} \ ] = - i \ \hbar$. Then we find that

$[\hat{p}_x^2, \hat{x} \ ] = \hat{p}_x \ [\hat{p}_x, \hat{x} \ ] + [\hat{p}_x, \hat{x} \ ] \ \hat{p}_x = -2 i \ \hbar \hat{p}_x$

and we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ] = - \frac{i \ \hbar}{m} \hat{p}_x$

Identical arguments show that

$\hat{H} \ \hat{y} - \hat{y} \ \hat{H} = \frac{1}{2m} [\hat{p}_y^2, \hat{y} \ ] = - \frac{i \ \hbar}{m} \hat{p}_y$

and

$\hat{H} \ \hat{z} - \hat{z} \ \hat{H} = \frac{1}{2m} [\hat{p}_z^2, \hat{z} \ ] = - \frac{i \ \hbar}{m} \hat{p}_z$

Thus, we finally reach our desired expression for the Hamiltonian-position commutator in three dimensions:

$[\hat{H}, \hat{\vec{r}} \ ] = - \frac{i \ \hbar}{m} \hat{\vec{p}}$

As an application of this result we will consider the problem of working out the expectation of position and velocity for a quantum particle. In three-dimensional space the quantum wave function is

$\Psi = \Psi(x, y, z, t)$

and we obtain the probability density function as

$\rho(x, y, z, t) = \Psi^{*} \Psi$

The wavefunction $\Psi$ satisfies the time-dependent Schrödinger equation

$i \hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\big(\frac{\partial^2 \Psi}{\partial x^2} + \frac{\partial^2 \Psi}{\partial y^2} + \frac{\partial^2 \Psi}{\partial z^2}\big) + U\Psi$

where $U = U(x, y, z, t)$ is some potential energy function. We can write the Schrödinger equation in operator form using Dirac notation as

$\hat{H} |\Psi \rangle = i \hbar \frac{\partial}{\partial t} |\Psi \rangle$

where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator (the Hamiltonian form of total energy) and

$i \hbar \frac{\partial }{\partial t}$

is the total energy operator. Note that the complex conjugate of the wavefunction $\Psi^{*}$ satisfies the Schrödinger equation written in Dirac notation as

$\langle \Psi | \hat{H} = - \langle \Psi | i \hbar \frac{\partial}{\partial t}$

In quantum mechanics we find the expected position $\langle \vec{r}(t) \rangle$ of the particle by integrating the position operator $\hat{\vec{r}}$ over all space, sandwiched between $\Psi^{*}$ and $\Psi$. Thus, letting $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$ we have

$\langle \vec{r}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \hat{\vec{r}} \ \Psi \text{d}V \equiv \langle \Psi \ | \hat{\vec{r}} \ | \Psi \rangle$

where in the last term I have switched to using Dirac notation which will be useful shortly. The expected velocity can then be obtained by differentiating this integral with respect to $t$. We get

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \frac{\partial \hat{\vec{r}}}{\partial t} \ \Psi \text{d}V$

The second triple integral on the right-hand side is zero because the position operator does not depend on time. The integrand in the first triple integral can be manipulated by using the operator form of the Schrödinger equation and Dirac notation to write

$\frac{\partial \Psi}{\partial t} = \frac{\partial}{\partial t} | \Psi \rangle = \frac{1}{i \ \hbar} \ \hat{H} \ | \Psi \rangle$

and

$\frac{\partial \Psi^{*}}{\partial t} = \langle \Psi | \frac{\partial}{\partial t} = -\frac{1}{i \ \hbar} \ \langle \Psi | \hat{H}$

Thus, we have

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V$

$= - \frac{1}{i \ \hbar} \langle \Psi |\hat{H} \ \hat{\vec{r}} \ | \Psi \rangle + \frac{1}{i \ \hbar} \langle \Psi | \hat{\vec{r}} \ \hat{H} | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \hat{H} \ \hat{\vec{r}} \ - \hat{\vec{r}} \ \hat{H} \ | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \ [\hat{H}, \hat{\vec{r}} \ ] \ | \Psi \rangle$

$= \frac{1}{m} \langle \Psi | \ \hat{\vec{p}} \ | \Psi \rangle$

$= \big \langle \frac{\hat{\vec{p}}}{m} \big \rangle$

where the last two equalities follow from the fact that $[\hat{H}, \hat{\vec{r}} \ ]$ is the commutator of $\hat{H}$ and $\hat{\vec{r}}$, which is equal to $-\frac{i \hbar}{m} \hat{\vec{p}}$ as we saw above. Therefore the expected velocity of a quantum particle looks a lot like the velocity of a classical particle (momentum divided by mass). The idea that quantum mechanical expectations exhibit Newtonian-like behaviour is the essence of the Ehrenfest Theorem of quantum mechanics.

# A closer look at how the principal, orbital and magnetic quantum numbers arise in Schrödinger’s theory of the hydrogen atom

In this note I want to explore some aspects of the solution of Schrödinger’s three-dimensional wave equation in spherical polar coordinates which pertain to the three main quantum numbers characterising the electron in a hydrogen atom: the electron’s principal, orbital and magnetic quantum numbers. In particular, I have noticed that many physics and mathematics sources tend to gloss over some (or all) of the mathematical details of how permissibility conditions on the three quantum numbers emerge naturally from validity constraints on the relevant underlying differential equations. I want to bring out these details clearly in the present note.

In general, four quantum numbers are needed to fully describe atomic electrons in many-electron atoms. These four numbers and their permissible values are:

Principal quantum number $\tilde{n} = 1, 2, 3, \ldots$

Orbital quantum number $l = 0, 1, 2, \ldots, (\tilde{n} - 1)$

Magnetic quantum number $m_l = 0, \pm 1, \pm 2, \ldots, \pm l$

Spin magnetic quantum number $m_s = -\frac{1}{2}, +\frac{1}{2}$

The principal quantum number determines the electron’s energy, the orbital quantum number its orbital angular-momentum magnitude, the magnetic quantum number its orbital angular-momentum direction, and the spin magnetic quantum number its spin direction.

I have noticed that it is often not explained clearly why, for example, the orbital quantum number cannot exceed the principal quantum number minus one, or why the magnitude of the magnetic quantum number cannot exceed that of the orbital quantum number. I will explore issues like these in depth in the context of the time-independent Schrödinger equation for the hydrogen atom, which only involves the first three quantum numbers. I will not discuss the spin magnetic quantum number in the present note.

Schrödinger’s wave equation for the hydrogen atom

In Cartesian coordinates, Schrödinger’s three-dimensional equation for the electron in the hydrogen atom is

$\frac{\partial^2 \psi}{\partial x^2} + \frac{\partial^2 \psi}{\partial y^2} + \frac{\partial^2 \psi}{\partial z^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

where $m_e$ denotes the electron mass. The potential energy $U$ is the electric potential energy of a charge $-e$ given that it is at distance $r$ from another charge $+e$, namely

$U = -\frac{e^2}{4 \pi \epsilon_0 r}$

It is necessary to change variables in Schrödinger’s equation since the potential energy is a function of radial distance $r$ rather than the Cartesian coordinate variables $x$, $y$ and $z$. Given the spherical symmetry of the atom, it is sensible to proceed by changing the variables in Schrödinger’s equation to those of spherical polar coordinates (rather than changing the $r$ variable in $U$ to Cartesian coordinates using $r = \sqrt{x^2 + y^2 + z^2}$). Only the variables in the Laplacian part of Schrödinger’s equation need to be changed, so we can use the approach in my previous note on changing variables in Laplace’s equation to get

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

I will now temporarily simplify things by using the representation of the square of the angular momentum operator in spherical polar coordinates which I obtained in another previous note, namely

$L^2 = - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

$= - \hbar^2 r^2 \big( \frac{1}{r^2 \sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

Using this to replace the two middle terms in Schrödinger’s equation and rearranging we get

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{2m_e}{\hbar^2}(E - U) \psi = \frac{L^2}{\hbar^2 r^2} \psi$

This equation can now be solved by the usual separation of variables approach. We assume that the $\psi$ function can be expressed as a product

$\psi(r, \theta, \phi) = R(r) Y(\theta, \phi)$

and then substitute this back into the wave equation to get

$\frac{Y}{r^2} \frac{d}{ d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e}{\hbar^2}(E - U) R Y = \frac{R}{\hbar^2 r^2} L^2 Y$

Dividing through by $\frac{R Y}{r^2}$ we get

$\frac{1}{R} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e r^2}{\hbar^2}(E - U) = \frac{1}{\hbar^2 Y} L^2 Y$

Since the left-hand side of this equation depends only on $r$ while the right-hand side depends only on $\theta$ and $\phi$, both sides must be equal to some constant which we can call $\lambda$. Setting the left and right-hand sides equal to $\lambda$ in turn and rearranging slightly we finally get the radial equation

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

and the angular equation

$L^2 Y = \lambda \hbar^2 Y$

We can now apply separation of variables again to the angular equation. Rewriting the operator $L^2$ in full the angular equation becomes

$- \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} \big) = \lambda \hbar^2 Y$

which simplifies to

$\frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} + \lambda Y = 0$

We assume that the $Y$ function can be written as the product

$Y(\theta, \phi) = \Theta (\theta) \Phi (\phi)$

Substituting this into the angular equation gives

$\frac{\Phi}{\sin \theta}\frac{d}{d \theta} \big( \sin \theta \frac{d \Theta}{d \theta}\big) + \frac{\Theta}{\sin^2 \theta}\frac{d^2 \Phi}{d \phi^2} + \lambda Y \Theta \Phi = 0$

Multiplying through by $\frac{\sin^2 \theta}{\Theta \Phi}$ and rearranging we get

$-\frac{1}{\Phi} \frac{d^2 \Phi}{d \phi^2} = \frac{\sin^2 \theta}{\Theta}\big[ \frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \lambda \Theta\big]$

Since the left-hand side of this equation depends only on $\phi$ while the right-hand side depends only on $\theta$, both sides must be equal to some constant which we can provisionally call $k$. Setting the left and right-hand sides equal to $k$ in turn and rearranging we get

$\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0$

and

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0$

We now have three ordinary differential equations that need to be solved, one for $\Phi$, one for $\Theta$ and one for $R$. We will solve each of them in turn.

The equation for $\Phi$

The equation for $\Phi$ is a straightforward second-order differential equation with auxiliary equation

$\zeta^2 + k = 0$

implying $\zeta = \pm \sqrt{-k}$ if $k < 0$ and $\zeta = \pm i \sqrt{k}$ if $k > 0$. Therefore it has a general solution of the form

$\Phi(\phi) = Ae^{\sqrt{-k} \phi} + Be^{- \sqrt{-k} \phi}$

if $k < 0$ and $\Phi(\phi) = Ae^{i \sqrt{k} \phi} + Be^{- i \sqrt{k} \phi}$ if $k > 0$, where $A$ and $B$ are arbitrary constants. Now, the azimuth angle $\phi$ can take any value in $(-\infty, \infty)$ but the function $\Phi$ must take a single value at each point in space (since this is a required property of the quantum wave function which $\Phi$ is a constituent of). It follows that the function $\Phi$ must be periodic since it must take the same value at $\phi$ and $\phi + 2\pi$ for any given $\phi$. This imposes two constraints on the form of the general solution: (1) it cannot consist only of exponential functions with real arguments since these are not periodic (thus ruling out the first general solution above and thereby implying that the separation constant $k$ must be nonnegative); (2) $\sqrt{k}$ must be an integer. Given these constraints, it is customary in quantum mechanics to denote $\pm \sqrt{k}$ by the letter $m$ (it is called the magnetic quantum number) and to specify the separation constant in the angular equations as $m^2$, which guarantees its nonnegativity. We then state the general solution of the equation for $\Phi$ as

$\Phi(\phi) = Ae^{i m \phi} + Be^{- i m \phi}$

In practice, a particular electron wave function will involve only a single value of $m$ so only the first of the two terms in the general solution will be necessary. We can therefore state the general solution of the equation for $\Phi$ for a given magnetic quantum number $m$ as

$\Phi(\phi) \propto e^{i m \phi}$

The equation for $\Theta$

Given that we now know the separation constant for the angular equations is either zero or a positive square number $k = m^2$, we can write the equation for $\Theta$ as

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

Expanding the first term we get

$\frac{1}{\sin \theta} \cos \theta \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2} + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

I am now going to multiply and divide the first two terms by $\sin^2 \theta$ to get

$\sin^2 \theta \big(\frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} \big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

$\iff$

$\sin^2 \theta \big(- \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} + \frac{2 \cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

Now we can make the change of variable $x = \cos \theta$ which implies $dx = - \sin \theta d \theta$ and therefore

$\frac{d \theta}{dx} = - \frac{1}{\sin \theta}$

$\frac{d \Theta}{d x} = \frac{d \Theta}{d \theta} \frac{d \theta}{d x} = - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta}$

$\frac{d^2 \Theta}{d x^2} = \frac{d}{d \theta} \big[ - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta} \big] \frac{d \theta}{d x} = - \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2}$

Using these in the amended form of the $\Theta$ equation together with the fact that $\sin^2 \theta$ = $1 - x^2$, the $\Theta$ equation becomes

$(1 - x^2) \big(\frac{d^2 \Theta}{d x^2} - \frac{2x}{1 - x^2} \frac{d \Theta}{d x} \big) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

$\iff$

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

We will solve this equation first for the case $m = 0$ (the solutions will be Legendre polynomials) and use these results to construct solutions for the case $m \neq 0$ (the solutions here will be the associated Legendre functions). Setting $m = 0$ we get

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \lambda \Theta = 0$

which has the form of a well known differential equation known as Legendre’s equation. It can be solved by assuming a series solution of the form

$\Theta = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + \cdots + a_n x^n + \cdots$

and then differentiating it term by term twice to get

$\Theta^{\prime} = a_1 + 2 a_2 x + 3 a_3 x^2 + 4 a_4 x^3 + \cdots + n a_n x^{n-1} + \cdots$

and

$\Theta^{\prime \prime} = 2 a_2 + 6 a_3 x + 12 a_4 x^2 + 20 a_5 x^3 + \cdots + n (n - 1) a_n x^{n-2} + \cdots$

We now substitute these into Legendre’s equation and set the coefficient of each power of $x$ equal to zero (because $\Theta$ must satisfy Legendre’s equation identically). We find that the coefficient of the $x^n$ term satisfies

$(n + 2)(n + 1) a_{n + 2} + (\lambda - n(n + 1)) a_n = 0$

which implies

$a_{n + 2} = - \frac{(\lambda - n(n + 1))}{(n + 2)(n + 1)} a_n$

This formula makes it possible to find any even coefficient as a multiple of $a_0$ and any odd coefficient as a multiple of $a_1$. The general solution of our Legendre equation is then a sum of two series involving two arbitrary constants $a_0$ and $a_1$:

$\Theta = a_0 \bigg \{1 - \frac{\lambda}{2!} x^2 + \frac{\lambda (\lambda - 6)}{4!} x^4 - \frac{\lambda(\lambda - 6)(\lambda - 20)}{6!} x^6 + \cdots \bigg \}$

$+ a_1 \bigg \{x - \frac{(\lambda - 2)}{3!} x^3 + \frac{(\lambda - 2)(\lambda - 12)}{5!} x^5 - \frac{(\lambda - 2)(\lambda - 12)(\lambda - 30)}{7!} x^7 + \cdots \bigg \}$

Both of the series in this sum converge for $x^2 < 1$ but in general they do not converge for $x^2 = 1$. This is a problem for us because in our change of variables we set $x = \cos \theta$ and we want solutions that converge for all possible values of $\theta$ including those that result in $x^2 = 1$. It turns out that the only way to get such solutions is to choose integer values of $\lambda$ that make either the $a_0$ or the $a_1$ series in the above sum terminate (the other series will generally be divergent so we remove it by setting the corresponding arbitrary constant equal to zero). This requires $\lambda$ to take values in the quadratic sequence $0$, $2$, $6$, $12$, $20$, $30$, $42$, $56 \ldots$ The $l$-th term of this sequence is $l(l + 1)$, so the separation constant $\lambda$ must be of this form, i.e., $\lambda = l(l + 1)$ for some $l = 0, 1, 2, 3, \dots$. When $l$ takes an even value the $a_0$ series will terminate and we can set $a_1 = 0$ to make the other series vanish. Conversely, when $l$ takes an odd value the $a_1$ series will terminate and we can set $a_0 = 0$ to make the other series vanish.

From the eigenvalue equation for $L^2$ given earlier ($L^2 Y = \lambda \hbar^2 Y$) it is clear that the magnitude of the orbital angular momentum is $L = \sqrt{l(l + 1)} \hbar$. It is interesting to see how the form of this arises mathematically from considering series solutions to Legendre’s equation above. The parameter $l$ is called the orbital angular momentum quantum number.

Note that negative integral values of $l$ are allowed but they simply give solutions already obtained for positive values. For example, $l = -2$ gives $\lambda = 2$ and this makes the $a_1$ series terminate, yielding the polynomial solution

$\Theta = a_1 x$

This is exactly the same solution as the one that would be obtained if $l = 1$. It is therefore customary to restrict $l$ to nonnegative values. Each possible value of $l$ gives a polynomial solution to Legendre’s equation. For $l = 0$ we get $\Theta = a_0$, for $l = 1$ we get $\Theta = a_1 x$, for $l = 2$ we get $\Theta = a_0 - 3 a_0 x^2$, and so on. If the value of $a_0$ or $a_1$ in each polynomial equation is selected so that $\Theta = 1$ when $x = 1$ the resulting polynomials are called Legendre polynomials, denoted by $P_l(x)$. Given that for each $l$ we have $P_l(1) = 1$ the first few Legendre polynomials are

$P_0(x) = 1$ $P_1(x) = x$

$P_2(x) = \frac{1}{2}(3 x^2 - 1)$

$P_3(x) = \frac{1}{2}\big(5 x^3 - 3x \big)$

These are the physically acceptable solutions to Legendre’s equation for $\Theta$ above.

We now consider the solutions for

$m \neq 0$ of the equation $(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

We now know that $\lambda = l(l + 1)$ so we can write this in and we can also add the subscript $l$ to $m$ as the solutions to this equation will involve a link the between the values of the orbital angular momentum and magnetic quantum numbers. The equation we need to solve becomes

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big[l(l + 1) - \frac{m_l^2}{1 - x^2} \big] \Theta = 0$

The link between $l$ and $m_l$ arises from the fact that we are constrained in trying to solve this equation: it encompasses the case $m_l = 0$ for which the physically acceptable solutions are the Legendre polynomials $P_l(x)$. Therefore the physically allowable solutions for the above equation must include the Legendre polynomials as a special case. We can find these by using the series approach again and it turns out that the physically acceptable solutions are the so-called associated Legendre functions which take the form

$P_l^{m_l}(x) = (1 - x^2)^{m_l/2}\frac{d^{m_l}}{d x^{m_l}}P_l(x)$

Now, each Legendre polynomial $P_l(x)$ is a polynomial of degree $l$. Therefore the $m_l$-th order derivative in $P_l^{m_l}$ will equal zero if $|m_l| > l$, so for physically acceptable solutions we must impose the constraint $|m_l| \leq l$ in the differential equation for $\Theta$. This is where the link between the quantum numbers $l$ and $m_l$ comes from in the quantum theory of the hydrogen atom: given a value of $l$ the acceptable values of $m_l$ are integers in the range $-l \leq m_l \leq l$.

Finally, note two things: (1) $P_l^{m_l}(x)$ reduces to the Legendre polynomial $P_l(x)$ when $m_l = 0$, which is what we needed. (2) A negative value for $m_l$ does not change $m_l^2$ in the original differential equation so a solution for positive $m_l$ is also a solution for the corresponding negative $m_l$. Thus many references define the associated Legendre function $P_l^{m_l}(x)$ for $-l \leq m_l \leq l$ as $P_l^{|m_l|}(x)$.

To conclude, given values for the quantum numbers $l$ and $m_l$, the general solution of the equation for $\Theta$ can be written as

$\Theta(\theta) \propto P_l^{m_l}(\cos \theta)$

To clarify where the principal quantum number comes from, the final equation we need to deal with is the radial equation

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

Writing $\lambda = l(l + 1)$ and replacing $U$ with the formula for the potential energy we get

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}\big(\frac{e^2}{4 \pi \epsilon_0 r} + E\big) - \frac{l(l + 1)}{r^2} \big] R = 0$

$\iff$

$\frac{d^2 R}{d r^2} + \frac{2}{r} \frac{d R}{d r} + \frac{2m_e}{\hbar^2} \big[ E + \frac{e^2}{4 \pi \epsilon_0 r} - \frac{l(l + 1) \hbar^2}{2 m_e r^2} \big] R = 0$

We are only interested in solutions for which the electron is bound within the atom, so we take $E < 0$ (the negative energy of the electron is the amount of energy that must be supplied to it to free it from the atom). In order to solve the above equation it is customary to make the change of variable

$\rho = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} r$

and define the dimensionless constant

$\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2}$

If we then specify $R = R(\rho)$ we have

$\frac{d R}{d r} = \frac{d R}{d \rho} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d R}{d \rho}$

$\frac{2}{r} \frac{d R}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{2}{\rho} \frac{d R}{d \rho}$

$\frac{d^2 R}{d r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d^2 R}{d \rho^2} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{d^2 R}{d \rho^2}$

$\frac{2 m_e}{\hbar^2} E + \frac{2 m_e}{\hbar^2} \frac{e^2}{4 \pi \epsilon_0 r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big \{\frac{1}{4}\frac{e^2}{4 \pi \epsilon_0 r} \big(-\frac{1}{E}\big) - \frac{1}{4}\big \} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big(\frac{\tau}{\rho} - \frac{1}{4}\big)$

$\frac{l(l + 1)}{r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{l(l + 1)}{\rho^2}$

Using these results we can rewrite the differential equation as

$\big(-\frac{8 m_e E}{\hbar^2}\big) \bigg \{\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) \bigg \} = 0$

$\iff$

$\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0$

To make further progress we consider the behaviour of this differential equation as $\rho \rightarrow \infty$. It reduces to

$\frac{d^2 R}{d \rho^2} - \frac{1}{4} R = 0$

which is a straightforward second-order differential equation with auxiliary equation

$\zeta^2 - \frac{1}{4} = 0$

$\implies \zeta = \pm \frac{1}{2}$

The positive solution to the auxiliary equation implies a term in the general solution of the form $e^{\rho/2}$ which is unacceptable since it explodes as $\rho \rightarrow \infty$. Therefore we only accept the negative solution to the auxiliary equation and the general solution for $R$ as $\rho \rightarrow \infty$ must be of the form

$R \propto e^{-\rho/2}$

This suggests we can try an exact solution of the full differential equation of the form

$R = e^{-\rho/2} F(\rho)$

Differentiating this twice we get

$\frac{d R}{d \rho} = -\frac{1}{2} e^{-\rho/2} F(\rho) + e^{-\rho/2} F^{\prime} (\rho)$

$\frac{d^2 R}{d \rho^2} = \frac{1}{4} e^{-\rho/2} F(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) + e^{-\rho/2} F^{\prime \prime} (\rho)$

Substituting these into the differential equation

$\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0$

gives

$F^{\prime \prime}(\rho) + \frac{(2 - \rho)}{\rho} F^{\prime}(\rho) + \big[\frac{(\tau - 1)}{\rho} - \frac{l(l + 1)}{\rho^2}\big] F(\rho) = 0$

$\iff$

$\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0$

We can now try to solve this latest version of the differential equation by the method of Frobenius, which involves assuming a generalised power series solution of the form

$F(\rho) = a_0 \rho^s + a_1 \rho^{s + 1} + a_2 \rho^{s + 2} + \cdots$

Differentiating twice we get

$F^{\prime}(\rho) = s a_0 \rho^{s-1} + (s + 1) a_1 \rho^s + (s + 2) a_2 \rho^{s + 1} + \cdots$

$F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^{s-2} + s (s + 1) a_1 \rho^{s-1} + (s + 1) (s + 2) a_2 \rho^s + \cdots$

Then the terms appearing in the differential equation have the generalised power series forms

$\rho^2 F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^s + s (s + 1) a_1 \rho^{s+1} + (s + 1) (s + 2) a_2 \rho^{s+2} + \cdots$

$2 \rho F^{\prime}(\rho) = 2 s a_0 \rho^s + 2 (s + 1) a_1 \rho^{s+1} + 2 (s + 2) a_2 \rho^{s + 2} + \cdots$

$-\rho^2 F^{\prime}(\rho) = - s a_0 \rho^{s+1} - (s + 1) a_1 \rho^{s+2} - (s + 2) a_2 \rho^{s + 3} - \cdots$

$(\tau - 1) \rho F(\rho) = (\tau - 1) a_0 \rho^{s+1} + (\tau - 1) a_1 \rho^{s + 2} + (\tau - 1) a_2 \rho^{s + 3} + \cdots$

$-l(l + 1) F(\rho) = -l(l + 1) a_0 \rho^s - l(l + 1) a_1 \rho^{s + 1} - l(l + 1) a_2 \rho^{s + 2} - \cdots$

Summing these terms (remembering that the sum must be identically equal to zero) we find the coefficient of $\rho^s$ to be

$[s(s - 1) + 2s - l(l+1)]a_0 = 0$

$\implies$

$s(s + 1) - l(l+1) = 0$

$\implies$

$s = l$ or $s = -l - 1$

Now, when $s = -l - 1$ the first term of the power series for $F(\rho)$ is $a_0/\rho^{l+1}$ which explodes as $\rho \rightarrow 0$. This is unacceptable so we discard this solution and set $s = l$.

For the coefficient of $\rho^{s + n}$ we get

$[(s+n)(s+n-1) + 2(s+n) - l(l+1)]a_n + [(\tau - 1) - (s+n-1)]a_{n-1} = 0$

Setting $s = l$ and rearranging gives us the recurrence equation

$a_n = \frac{(l + n - \tau)}{(l+n+1)(l+n) - l(l+1)} a_{n-1}$

From this recurrence equation we observe that

$a_n \rightarrow \frac{1}{n}a_{n-1} = \frac{1}{n!}a_0$

as $n \rightarrow \infty$. We deduce from this that the series for $F(\rho)$ becomes like $a_0 \rho^l \sum \frac{\rho^n}{n!}$ as $n \rightarrow \infty$ and therefore $R = e^{-\rho/2} F(\rho)$ becomes like $a_0 \rho^l e^{\rho/2}$. However, this diverges as $\rho \rightarrow \infty$ which is unacceptable, so we conclude that the series for $F(p)$ must terminate at some value of $n$ which we will call $N$. In this case we have $a_{N+1} = 0$ which the recurrence equation tells us can only happen if

$\tau = l + N + 1 \equiv \tilde{n}$

This is how the principal quantum number $\tilde{n}$ first appears. Now, we have

$\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2} = \tilde{n}$

$\iff$

$\big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \big(-\frac{m_e}{2 \hbar^2}\big) \frac{1}{E} = \tilde{n}^2$

$\iff E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}$

These are the famous bound-state energy eigenvalues for $\tilde{n} = 1, 2, \ldots$. This is the same formula for the energy levels of the hydrogen atom that Niels Bohr obtained by intuitive means in his 1913 ‘solar system’ model of atomic structure.

As stated above, the integer $\tilde{n}$ is called the principal quantum number. Recall that $\tilde{n} = l + N + 1$ and $N$ cannot be smaller than zero. It follows that

$\tilde{n} - l - 1 \geq 0$

$\iff$

$l \leq \tilde{n} - 1$

This explains why for a given value of $\tilde{n}$ the allowable values of $l$ are $l = 0, 1, 2, \dots, (\tilde{n}-1)$.

Returning to the solution of

$\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0$

the above discussion suggests that we should look for a solution of the form

$F(\rho) = a_0 \rho^l L(\rho)$

where $L(\rho)$ is a polynomial (rather than an infinite series). Differentiating this twice gives

$F^{\prime}(\rho) = a_0 l \rho^{l-1}L(\rho) + a_0 \rho^l L^{\prime}(\rho)$

$F^{\prime \prime} (\rho) = a_0 (l-1) l \rho^{l-2}L(\rho) + 2 a_0 l \rho^{l-1} L^{\prime}(\rho) + a_0 \rho^l L^{\prime \prime}(\rho)$

Substituting these into the differential equation and setting $\tau = \tilde{n}$ we get

$\rho^{l+2} L^{\prime \prime}(\rho) + (2l + 2 - \rho) \rho^{l+1}L^{\prime}(\rho) + (\tilde{n} - 1 - l)\rho^{l+1}L(\rho) = 0$

$\iff$

$\rho L^{\prime \prime}(\rho) + (2l + 2 - \rho) L^{\prime}(\rho) + (\tilde{n} - 1 - l) L(\rho) = 0$

$\iff$

$\rho L^{\prime \prime}(\rho) + (\alpha + 1 - \rho) L^{\prime}(\rho) + \tilde{n}^{*} L(\rho) = 0$

where $\alpha \equiv 2l + 1$ and $\tilde{n}^{*} \equiv \tilde{n} - 1 - l$. This last form is a well known differential equation whose physically acceptable solutions in the present context are associated Laguerre polynomials given by the formula

$L_{\tilde{n}^{*}}^{(\alpha)} = \sum_{j = 0}^{\tilde{n}^{*}} (-1)^j \frac{(\tilde{n}^{*} + \alpha)!}{(\tilde{n}^{*}-j)!(\alpha + j)!}\frac{\rho^j}{j!}$

For given quantum numbers $\tilde{n}$ and $l$, the solution of the radial equation for $R$ is then

$R_{\tilde{n}l}(r) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)}$

Final form of the electron wave function $\psi$

Putting everything together, for given principal quantum number $\tilde{n}$, orbital quantum number $l$ and magnetic quantum number $m_l$, the wave function of the electron in the hydrogen atom is

$\psi_{\tilde{n} l m_l}(r, \theta, \phi) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)} P_l^{m_l}(\cos \theta) e^{i m_l \phi}$

where

$\rho = \big(-\frac{8 m_e E_{\tilde{n}}}{\hbar^2}\big)^{1/2} r$

and

$E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}$

# Changing variables in the square of the quantum mechanical angular momentum operator

A particle of mass $m$ with position vector $\mathbf{r}$ and velocity $\mathbf{v}$ (with respect to some specified origin) has a linear momentum vector $\mathbf{p} = m \mathbf{v}$ and angular momentum vector

$\mathbf{L} = \mathbf{r} \times \mathbf{p} = m \mathbf{r} \times \mathbf{v}$

where $\times$ is the vector product operation. The magnitude of the angular momentum vector is $L = rmv\sin \theta$ where $\theta$ is the angle between $\mathbf{r}$ and $\mathbf{v}$. The direction of $\mathbf{L}$ is given by the right-hand rule when the vectors $\mathbf{r}$ and $\mathbf{v}$ are placed with their tails at the same point: one curls the fingers of the right hand in the direction of rotation of $\mathbf{r}$ into $\mathbf{v}$ and the thumb then points in the direction of $\mathbf{L}$. One can find the components of $\mathbf{L}$ in the $x$, $y$ and $z$ directions in Cartesian coordinates using

$\mathbf{L} = \mathbf{r} \times \mathbf{p} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & z \\ \ \\p_x & p_y & p_z \end{vmatrix}$

$= (yp_z - zp_y)\mathbf{i} + (zp_x - xp_z)\mathbf{j} + (xp_y - yp_x)\mathbf{k}$

Therefore the components of $\mathbf{L}$ in Cartesian coordinates are

$L_x = yp_z - zp_y$

$L_y = zp_x - xp_z$

$L_z = xp_y - yp_x$

In classical mechanics the angular momentum magnitude $L$ can take a continuum of values but in quantum mechanics only certain discrete values for $L$ are permissible. Furthermore, the linear momentum vector $\mathbf{p}$ appearing in $\mathbf{L}$ must obey Heisenberg’s uncertainty principle in each direction, i.e.,

$\Delta x \Delta p_x \geq \frac{\hbar}{2}$

in the $x$-direction and similarly for the $y$ and $z$ directions. These features of quantized variables like $\mathbf{p}$ and $\mathbf{L}$ make it necessary to do calculations with them in quantum mechanics using their operator representations. (For example, on the quantum scale one cannot calculate the expectation of momentum in the $x-$direction using an integral involving momentum as a function of $x$ because no such function can exist: Heisenberg’s uncertainty principle prevents accurate knowledge of momentum when the value of $x$ is known exactly. One must therefore use the operator representation of momentum rather than momentum as some function of $x$ in order to calculate the expectation). It is a basic postulate of quantum mechanics that every observable quantity characterising a physical system can be represented by a quantum mechanical operator obtained by expressing the quantity in terms of $\mathbf{r}$ and $\mathbf{p}$ and then replacing the vector $\mathbf{p}$ by $- i \hbar \nabla$ where

$\nabla = \mathbf{i} \frac{\partial}{\partial x} + \mathbf{j} \frac{\partial}{\partial y} + \mathbf{k} \frac{\partial}{\partial z}$

and its components $p_x$$p_y$ and $p_z$  by

$p_x = - i \hbar \frac{\partial}{\partial x}$

$p_y = - i \hbar \frac{\partial}{\partial y}$

$p_z = - i \hbar \frac{\partial}{\partial z}$

Taking this on board we can then write $\mathbf{L}$ and $L_x$, $L_y$ and $L_z$ in quantum mechanical operator form in Cartesian coordinates as

$\mathbf{L} = - i \hbar (\mathbf{r} \times \nabla)$

$L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)$

$L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)$

$L_x = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)$

In order to perform some calculations involving Schrödinger’s equation I needed to employ the square of the quantum mechanical angular momentum operator $L^2$, but in spherical polar coordinates rather than Cartesian coordinates, where

$L^2 = L_x^2 + L_y^2 + L_z^2$

I used the matrix calculus approach in my previous note to achieve the necessary change of variables in $L^2$. In the present note I want to record the details of this calculation as I have never seen this approach used elsewhere. (This change of variables can also be done using a scale factor method based on vector calculus which I will not go into here).

As in my previous note we begin the matrix calculus approach with the standard conversion equations for spherical polar coordinates:

$x = r \sin \theta \cos \phi$

$y = r \sin \theta \sin \phi$

$z = r \cos \theta$

Differentiating with respect to the vector $(r, \theta, \phi)$ we get

$\begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix}$

We can then use the matrix version of the chain rule to write

$\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}$

We can solve this system by inverting the coefficient matrix to get

$\begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \ \\ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \ \\ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

Using the equations in this system together with the standard conversion equations we then have

$y \frac{\partial F}{\partial z} = r \sin \theta \sin \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \sin \phi \frac{\partial F}{\partial r} - \sin^2 \theta \sin \phi \frac{\partial F}{\partial \theta}$

and

$z \frac{\partial F}{\partial y} = r \cos \theta \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial r} + \cos^2 \theta \sin \phi \frac{\partial F}{\partial \theta} + \frac{\cos \theta \cos \phi}{\sin \theta}\frac{\partial F}{\partial \phi}$

Subtracting the second expression from the first and ignoring the $F$ in the numerators of the partial derivatives we can then write the angular momentum operator in the $x-$direction in terms of spherical polar coordinates as

$L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)$

$= - i \hbar \big( - \cot \theta \cos \phi \frac{\partial}{\partial \phi} - \sin \phi \frac{\partial}{\partial \theta} \big)$

$= i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big)$

Similarly we have

$z \frac{\partial F}{\partial x} = r \cos \theta \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial r} + \cos^2 \theta \cos \phi \frac{\partial F}{\partial \theta} - \frac{\cos \theta \sin \phi}{\sin \theta}\frac{\partial F}{\partial \phi}$

and

$x \frac{\partial F}{\partial z} = r \sin \theta \cos \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \cos \phi \frac{\partial F}{\partial r} - \sin^2 \theta \cos \phi \frac{\partial F}{\partial \theta}$

Therefore

$L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)$

$= - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)$

Finally, we have

$x \frac{\partial F}{\partial y} = r \sin \theta \cos \phi \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} + \cos^2 \phi \frac{\partial F}{\partial \phi}$

and

$y \frac{\partial F}{\partial x} = r \sin \theta \sin \phi \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} - \sin^2 \phi \frac{\partial F}{\partial \phi}$

Therefore

$L_z = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)$

$= - i \hbar \frac{\partial}{\partial \phi}$

Having obtained the components of $\mathbf{L}$ in spherical polar coordinates we can now finally calculate the operator representation of $L^2$ as

$L^2 = L_x^2 + L_y^2 + L_z^2$

$= \big[ i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big) \big]^2 + \big[ - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) \big]^2 + \big[ - i \hbar \frac{\partial}{\partial \phi} \big]^2$

$= - \hbar^2 \big[ \sin^2 \phi \frac{\partial^2}{\partial \theta^2} + \big(\sin \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big) + \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big)\big(\sin \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \cos^2 \phi \frac{\partial^2}{\partial \phi^2} \big]$

$- \hbar^2 \big[ \cos^2 \phi \frac{\partial^2}{\partial \theta^2} - \big(\cos \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) - \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)\big(\cos \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \sin^2 \phi \frac{\partial^2}{\partial \phi^2} \big]$

$- \hbar^2 \frac{\partial^2}{\partial \phi^2}$

$= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2} - \sin \phi \frac{1}{\sin^2 \theta} \cos \phi \frac{\partial}{\partial \phi} + \cot \theta \cos^2 \phi \frac{\partial}{\partial \theta}$

$+ \cos \phi \frac{1}{\sin^2 \theta} \sin \phi \frac{\partial}{\partial \phi} + \cot \theta \sin^2 \phi \frac{\partial}{\partial \theta}\big)$

$= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot \theta \frac{\partial}{\partial \theta} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2}\big)$

$= - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$