On passing from the discrete to the continuous form of Slutsky’s identity in microeconomics

slutskyI have noticed that there is a sharp ‘jump’ in the literature concerning Slutsky’s decomposition equation for the effects of a price change on demand. Elementary discussions usually employ a discrete version of Slutsky’s equation which assumes a relatively large price change. This is typically illustrated in a `pivot-shift’ type diagram which is familiar to all undergraduate microeconomics students. In more advanced (typically postgraduate) treatments, however, the discussion suddenly jumps to using a full blown partial differential equation form of Slutsky’s identity assuming an infinitesimal price change. The partial differential equation form is usually expressed in terms of a Hicksian demand function. I have not been able to find any appealing discussions of how the set-up that relates to the discrete form of Slutsky’s equation naturally evolves into the partial differential equation form as we pass from a large price change to the limit of an infinitesimally small price change. In this note I want to explore how the discrete form naturally evolves into the partial differential equation form when we make the price change infinitesimally small.

Slutsky’s decomposition equation expresses the effects of a price change on Marshallian demand in terms of a pure substitution effect, which is always negative, and an income effect, which can be positive or negative depending on whether the good is a normal good or an inferior good respectively. I will employ a simple two-good numerical example to illustrate the discrete form of Slutsky’s equation, using a Cobb-Douglas utility function of the form

u(x, y) = x^{1/2}y^{1/2}

The mathematical problem is to find the combination of x and y which maximises this utility function subject to the budget constraint

p_1x + p_2y = m

The Cobb-Douglas utility function is globally concave and smooth so we are guaranteed to find a unique interior solution by partial differentiation. One normally proceeds by taking the natural logarithm of the utility function (this is a monotonic transformation so does not affect the preferences represented by the original utility function) and setting up the Lagrangian for the problem, namely

L = \frac{1}{2}\ln x + \frac{1}{2}\ln y + \lambda (m - p_1x -p_2y)

Taking first-order partial derivatives with respect to x, y and \lambda and setting them equal to zero we get

\frac{\partial L}{\partial x}  = \frac{1}{2x} - \lambda p_1 = 0

\frac{\partial L}{\partial y} = \frac{1}{2y} - \lambda p_2 = 0

\frac{\partial L}{\partial \lambda} = m - p_1x - p_2y = 0

This is a system of three equations in three unknowns. Dividing the first equation by the second and rearranging one obtains

\frac{y}{x} = \frac{p_1}{p_2}

Solving the third equation for x we get

x = \frac{m}{p_1} - \frac{p_2}{p_1}y

and substituting this into the equation above we get

\frac{y}{\frac{m}{p_1} - \frac{p_2}{p_1}y} = \frac{p_1}{p_2}


y = \frac{m}{2p_2}

This is the uncompensated demand function (often also called the Marshallian demand function) for good y. By symmetry, the uncompensated demand function for good x is

x = \frac{m}{2p_1}

(Note that rearranging the demand function for y we get p_2y = \frac{m}{2} which says that the consumer will spend exactly half of the income on y, and similarly for x. Whenever the Cobb-Douglas utility function is in a form in which the exponents on the goods are fractions which sum to 1, these fractions tell us the proportions of income which will be spent on the corresponding goods. Our utility function was of the form u(x, y) = x^{1/2}y^{1/2} so one-half of the total income is spent on each good, as we confirmed with the above calculation).

To illustrate Slutsky’s decomposition of the effects of a price change into a pure substitution effect and an income effect, consider the above uncompensated demand function for x and suppose that m = \pounds 1000 while p_1 = \pounds 10. The amount of x demanded at this income and price is then

x(p_1, m) = \frac{1000}{(2)(10)}= 50

This corresponds to the amount of x in the bundle A in the diagram below.


Now suppose that the price rises to p_1^{*} = \pounds 20

The amount of x demanded at the original income and this new price falls to

x(p_1^{*}, m) = \frac{1000}{(2)(20)} = 25

This corresponds to the amount of x in the bundle C in the diagram.

Slutsky’s decomposition of this total change in demand begins by asking what change in income would be enough to enable the consumer to buy the original amount of x at the new price. This amount of additional income is obtained as

p_1^{*}x - p_1x = (20)(50) - (10)(50) = \pounds 500

Therefore, `compensating’ the consumer by increasing the income level from m = \pounds 1000 to m^{*} = \pounds 1500 enables them to buy their original bundle A with x = 50. This increase in the income level corresponds to a shift outwards in the new budget line to a position represented by the blue budget line in the diagram.

In the sense that the original bundle A is affordable again (so purchasing power has remained constant), the consumer is now as well off as before, but the original bundle A is no longer the utility-maximising one at the new price and the higher income level. The consumer will want to adjust the bundle until the utility function is maximised at the new price and new income. The amount of x the consumer will actually demand at the new price and new income level will be

x(p_1^{*}, m^{*}) = \frac{1500}{(2)(20)} = 37.5

This corresponds to the amount of x in the bundle B in the diagram above, and is usually referred to in the literature as the compensated demand for x (as opposed to the uncompensated demand at point A). The pure substitution effect of the price rise (i.e., abstracting from the income effect) is then the change in demand for x when the price of x changes to p_1^{*} and at the same time the income level changes to m^{*} to keep the consumer’s purchasing power constant:

x(p_1^{*}, m^{*}) - x(p_1, m) = 37.5 - 50 = -12.5

This is the change in the amount of x represented by the shift from bundle A to bundle B in the diagram above.

In this numerical example, the pure substitution effect of the price rise accounts for exactly half of the total drop in the demand for x from 50 at point A to 25 at point C. The other half of the drop in the demand for x is accounted for by the income effect (sometimes called the `wealth’ effect) of the price rise, which is represented in the diagram above by a parallel shift inwards of the blue budget line to the position of the final budget line on which bundle C lies. This is the change in demand for x when we change the income level from m^{*} back to m, holding the price of x fixed at p_1^{*}. Thus, the income effect is computed as

x(p_1^{*}, m) - x(p_1^{*}, m^{*}) = 25 - 37.5 = -12.5

The substitution effect plus the income effect together account for the full drop in the demand for x as a result of moving from bundle A to bundle C in response to the price rise of x.

In this simple numerical example Slutsky’s decomposition equation takes the discrete form

x(p_1^{*}, m) - x(p_1, m) = \big \{ x(p_1^{*}, m^{*}) - x(p_1, m) \big \} + \big\{ x(p_1^{*} , m) - x(p_1^{*}, m^{*}) \big \}

As we pass to the limit of an infinitesimally small price change, however, Slutsky’s decomposition equation takes the form of a partial differential equation which can be derived quite naturally from the discrete form by the following arguments. Suppose we start at point A in the above diagram again, and suppose the price changes by an infinitesimal amount \delta, i.e.,

p_1 \longrightarrow p_1 + \delta

The (infinitesimal) amount of income needed to compensate the consumer so that the bundle at A remains affordable after the price change is

(p_1 + \delta ) x - p_1 x = \delta x

We can then rewrite the discrete form of Slutsky’s equation as

x(p_1 + \delta, \ m) - x(p_1, \ m) =

\big \{ x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m) \big \} + \big \{x(p_1 + \delta, \ m) - x(p_1 + \delta, \ m + \delta x) \big \}


x(p_1 + \delta, \ m) - x(p_1, \ m) =

\big \{ x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m) \big \} - \big \{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m) \big \}

Dividing through by \delta and taking the limit as \delta \rightarrow 0 we get

\frac{\partial x}{\partial p_1} = \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} - \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta}

The second term on the right-hand side, the income effect, can be written as

\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta} = x \lim_{\delta x \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta x} = x \frac{\partial x}{\partial m}

Our partial differential equation form of Slutsky’s identity has then so far evolved to

\frac{\partial x}{\partial p_1} = \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} - x \frac{\partial x}{\partial m}

Now consider the first term on the right-hand side, namely

\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta}

This is the effect of the (infinitesimal) price change when we eliminate the income effect by changing the income to m + \delta x, i.e., it is the pure price substitution effect. What we are going to do is replace the two terms in the numerator by corresponding Hicksian demand functions, usually denoted by the letter h. Consider first the original demand function x(p_1, \ m) at point A in the diagram above. This was obtained by maximising utility subject to the price p_1 and income level m. However, we can also regard it as having been obtained by solving the dual problem of minimising the expenditure required to achieve the level of utility u(x^A, y^A) associated with the bundle A in the diagram above, given the price p_1. We would find that the minimised expenditure level would be

E(p_1, \ u(x^A, y^A)) = m

and the amount of x demanded would be the same as the original amount:

x(p_1, \ m) = h(p_1, \ E(p_1, \ u(x^A, y^A)))

Now consider the problem of minimising the expenditure required to achieve the level of utility u(x^A, y^A) given the changed price p_1 + \delta. The minimised expenditure level would be

E(p_1 + \delta, \ u(x^A, y^A))

Now, it is clearly the case that this minimised expenditure is `sandwiched’ between m and m + \delta x, i.e.,

m < E(p_1 + \delta, \ u(x^A, y^A)) < m + \delta x

so the difference between the three quantities in the inequality must become vanishingly small as \delta \rightarrow 0. Therefore as long as we are taking limits as \delta \rightarrow 0, we can replace x(p_1 + \delta, \ m + \delta x) with the Hicksian demand function

h(p_1 + \delta, \ E(p_1 + \delta, \ u(x^A, y^A)))

(What I am basically saying here is that as \delta \rightarrow 0, we can replace the point B in the diagram above with a point that is on the same indifference curve as the original bundle at point A. In this case the consumer would be compensated not by returning them to the original purchasing power after the price change, but to the original level of utility they were experiencing. As \delta becomes infinitesimally small, it makes no difference which of these two points we think about for the pure substitution effect. When the price change is large, these two points diverge and we then have a distinction between a Slutsky substitution effect which involves restoring the original purchasing power, and a Hicksian substitution effect which involves restoring the original utility level. The distinction between these two disappears when the price change is infinitesimal). We then have

\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} = \lim_{\delta \to 0} \frac{h(p_1 + \delta, \ E(p_1 + \delta, \ u(x^A, y^A))) - h(p_1, E(p_1, \ u(x^A, y^A)))}{\delta}


\frac{\partial h}{\partial p_1}

We now have the final partial differential equation form of Slutsky’s decomposition equation in the way it is usually written using the Hicksian demand function:

\frac{\partial x}{\partial p_1} = \frac{\partial h}{\partial p_1} - x \frac{\partial x}{\partial m}

As before, this says that the total effect of a price change is composed of a pure substitution effect (with income adjusted to exactly compensate the consumer for the wealth effect of the price change) and an income effect.

To check that this partial differential equation works in the context of our Cobb-Douglas example above, we can compute the partial derivatives explicitly. Since the demand function with the above Cobb-Douglas preferences would be

x = \frac{m}{2p_1}

we have

\frac{\partial x}{\partial p_1} = - \frac{m}{2(p_1)^2}

By solving a simple expenditure minimisation problem it can easily be shown that

h = \frac{(p_2)^{1/2}}{(p_1)^{1/2}} u

(The expenditure minimisation problem would be to minimise E = p_1 x + p_2 y subject to x^{1/2} y^{1/2} = u. Solving the constraint equation for y and substituting into the objective function gives an unconstrained minimisation problem in the variable x only. Solving this yields the above expression for the Hicksian demand function h).

Therefore we also have

\frac{\partial h}{\partial p_1} = -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1)^{3/2}} u

= -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1)^{3/2}}\big(\frac{m}{2p_1}\big)^{1/2}\big(\frac{m}{2p_2}\big)^{1/2}

= -\frac{m}{4(p_1)^2}


-\frac{\partial x}{\partial m}\cdot x = -\frac{1}{2p_1}\frac{m}{2p_1} = -\frac{m}{4(p_1)^2}

Putting these into the partial differential form of Slutsky’s equation we see that the equation is satisfied.

On the classification of singularities, with an application to non-rotating black holes

singularity In mathematics a singularity is a point at which a mathematical object (e.g., a function) is not defined or behaves `badly’ in some way. Singularities can be isolated (e.g., removable singularities, poles and essential singularities) or nonisolated (e.g., branch cuts). For teaching purposes, I want to delve into some of the mathematical aspects of isolated singularities in this note using simple examples involving the complex sine function. I will not consider nonisolated singularities in detail. These are briefly discussed with some examples in this Wikipedia page. I will also briefly look at how singularities arise in the context of black hole physics in a short final section.


Definition: A function f has an isolated singularity at the point \alpha if f is analytic on a punctured open disc \{z: 0 < |z - \alpha| < r \}, where r > 0, but not at \alpha itself.

Note that a function f is analytic at a point \alpha if it is differentiable on a region containing \alpha. Strangely, a function can have a derivative at a point without being analytic there. For example, the function f(z) = |z|^2 has a derivative at z = 0 but at no other point, as can easily be verified using the Cauchy-Riemann equations. Therefore this function is not analytic at z = 0. Also note with regard to the definition of an isolated singularity that the function MUST be analytic on the `whole’ of the punctured open disc for the singularity to be defined. For example, despite appearances, the function

f(z) = \frac{1}{\sqrt{z}}

does not have a singularity at z = 0 because it is impossible to define a punctured open disc centred at 0 on which f(z) is analytic (the function z \rightarrow \sqrt{z} is discontinuous everywhere on the negative real axis, so f(z) fails to be analytic there).

I find it appealing that all three types of isolated singularity (removable, poles and essential singularities) can be illustrated by using members of the following family of functions:

f(z) = \frac{\sin(z^m)}{z^n}

where m, n \in \mathbb{N}. For example, if m = n = 1 we get

f_1(z) = \frac{\sin(z)}{z}

which has a removable singularity at z = 0. If m = 1, n = 3 we get

f_2(z) = \frac{\sin(z)}{z^3}

which has a pole of order 2 at z = 0. Finally, if m = -1, n = 0 we get

f_3(z) = \sin\big( \frac{1}{z} \big)

which has an essential singularity at z = 0. In each of these three cases, the function is not analytic at z = 0 but is analytic on a punctured open disc with centre 0, e.g., \{z: 0 < |z| < 1\} or indeed \mathbb{C} - \{0\} (which can be thought of as a punctured disc with infinite radius). In what follows I will use these three examples to delve into structural definitions of the three types of singularity. I will then explore their classification using Laurent series expansions.

Structural definitions of isolated singularities

Removable singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a removable singularity at \alpha if there is a function g which is analytic at \alpha such that

f(z) = g(z) for 0 < |z - \alpha| < r

We can see that g extends the analyticity of f to include \alpha, so we say that g is an analytic extension of f to the circle

\{z: |z - \alpha| < r \}

With removable singularities we always have that \lim_{z \rightarrow \alpha} f(z) exists since

\lim_{z \rightarrow \alpha} f(z) = g(\alpha)

(this will not be true for the other types of singularity) and the name of this singularity comes from the fact that we can effectively `remove’ the singularity by defining f(\alpha) = g(\alpha).

To apply this to the function

f_1(z) = \frac{\sin(z)}{z}

we first observe that the Maclaurin series expansion of \sin(z) is

\sin(z) = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \frac{z^7}{7!} + \cdots for z \in \mathbb{C}

Therefore we can write

f_1(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C} - \{0\}

If we then set

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

we see that g(z) extends the analyticity of f_1(z) to include z = 0. We also see that

\lim_{z \rightarrow 0} f_1(z) = g(0)

Therefore f_1(z) has a removable singularity at z = 0.

Poles of order k, k > 0

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a pole of order k at \alpha if there is a function g, analytic at \alpha with g(\alpha) \neq 0, such that

f(z) = \frac{g(z)}{(z - \alpha)^k} for 0 < |z - \alpha| < r

With poles of order k we always have that

f(z) \rightarrow \infty as z \rightarrow \alpha

(which distinguishes them from removable singularities)


\lim_{z \rightarrow \alpha} (z - \alpha)^k f(z)

exists and is nonzero (since \lim_{z \rightarrow \alpha} (z - \alpha)^k f(z) = g(\alpha) \neq 0).

To apply this to the function

f_2(z) = \frac{\sin(z)}{z^3}

we first observe that

f_2(z) = \frac{\sin(z)/z}{z^2} = \frac{g(z)}{z^2} for z \in \mathbb{C} - \{0\}

where g is the function

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

Since g(0) = 1 > 0, we see that f_2(z) behaves like \frac{1}{z^2} near z = 0 and

f_2(z) \rightarrow \infty as z \rightarrow 0

so the singularity at z = 0 is not removable. We also see that

\lim_{z \rightarrow 0} z ^2 f_2(z) = g(0) = 1

Therefore the function f_2(z) has a pole of order 2 at z = 0.

Essential singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has an essential singularity at \alpha if the singularity is neither removable nor a pole. Such a singularity cannot be removed in any way, including by mutiplying by any (z - \alpha)^k, hence the name.

With essential singularities we have that

\lim_{z \rightarrow \alpha} f(z)

does not exist, and f(z) does not tend to infinity as z \rightarrow \alpha.

To apply this to the function

f_3(z) = \sin\big( \frac{1}{z}\big)

we observe that if we restrict the function to the real axis and consider a sequence of points

z_n = \frac{2}{(2n + 1) \pi}

then we have that z_n \rightarrow 0 whereas

f_3(z_n) = \sin\big(\frac{(2n + 1) \pi}{2}\big) = (-1)^n


\lim_{z \rightarrow 0} f_3(z)

does not exist, so the singularity is not removable, but it is also the case that

\lim_{z \rightarrow 0} f_3(z) \not \rightarrow \infty

so the singularity is not a pole. Since it is neither a removable singularity nor a pole, it must be an essential singularity.

Classification of isolated singularities using Laurent series

By Laurent’s Theorem, a function f which is analytic on an open annulus

A = \{z: 0 \leq r_1 < |z - \alpha| < r_2 \leq \infty \}


(shown in the diagram) can be represented as an extended power series of the form

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

= \cdots + \frac{a_{-2}}{(z - \alpha)^2} + \frac{a_{-1}}{(z - \alpha)} + a_0 + a_1 (z - \alpha) + a_2 (z - \alpha)^2 + \cdots

for z \in A, which converges at all points in the annulus. It is an `extended’ power series because it involves negative powers of (z - \alpha). (The part of the power series involving negative powers is often referred to as the singular part. The part involving non-negative powers is referred to as the analytic part). This extended power series representation is the Laurent series about \alpha for the function f on the annulus A. Laurent series are also often used in the case when A is a punctured open disc, in which case we refer to the series as the Laurent series about \alpha for the function f.

The Laurent series representation of a function on an annulus A is unique. We can often use simple procedures, such as finding ordinary Maclaurin or Taylor series expansions, to obtain an extended power series and we can feel safe in the knowledge that the power series thus obtained must be the Laurent series.

Laurent series expansions can be used to classify singularities by virtue of the following result: If a function f has a singularity at \alpha and if its Laurent series expansion about \alpha is

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n


(a) f has a removable singularity at \alpha iff a_n = 0 for all n < 0;

(b) f has a pole of order k at \alpha iff a_n = 0 for all n < -k and a_{-k} \neq 0;

(c) f has an essential singularity at \alpha iff a_n \neq 0 for infinitely many n < 0.

To apply this to our three examples, observe that the function

f_1(z) = \frac{\sin(z)}{z}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z} = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots

for z \in \mathbb{C} - \{0\}. This has no non-zero coefficients in its singular part (i.e., it only has an analytic part) so the singularity is a removable one.

The function

f_2(z) = \frac{\sin(z)}{z^3}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z^3} = \frac{1}{z^2} - \frac{1}{3!} + \frac{z^2}{5!} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n = 0 for all n < -2 and a_{-2} \neq 0, so the singularity in this case is a pole of order 2.

Finally, the function

f_3(z) = \sin\big( \frac{1}{z} \big)

has a singularity at 0 and its Laurent series expansion about 0 is

\sin \big(\frac{1}{z} \big) = \frac{1}{z} - \frac{1}{3! z^3} + \frac{1}{5! z^5} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n \neq 0 for infinitely many n < 0 so the singularity here is an essential singularity.

Singularities in Schwarzschild black holes

One often hears about singularities in the context of black hole physics and I wanted to quickly look at singularities in the particular case of non-rotating black holes. A detailed investigation of the various singularities that appear in exact solutions of Einstein’s field equations was conducted in the 1960s and 1970s by Penrose, Hawking, Geroch and others. See, e.g., this paper by Penrose and Hawking. There is now a vast literature on this topic. The following discussion is just my own quick look at how the ideas might arise.

The spacetime of a non-rotating spherical black hole is usually analysed using the Schwarzschild solution of the Einstein field equations for an isolated spherical mass m. In spherical coordinates this is the metric

\Delta \tau = \bigg[ \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2} \bigg\{\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)} + r^2(\Delta \theta)^2 + r^2 \sin^2 \theta (\Delta \phi)^2\bigg\} \bigg]^{1/2}


k = \frac{2mG}{c^2} and m is the mass of the spherically symmetric static object exterior to which the Schwarzschild metric applies. If we consider only radial motion (i.e., world lines for which \Delta \theta = \Delta \phi = 0) the Schwarzschild metric simplifies to

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2}\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)}

We can see that the \Delta r term in the metric becomes infinite at r = k so there is apparently a singularity here. However, this singularity is `removable’ by re-expressing the metric in a new set of coordinates, r and t^{\prime}, known as the Eddington-Finkelstein coordinates. The transformed metric has the form

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t^{\prime})^2 - \frac{2k \Delta t^{\prime} \Delta r}{cr} - \frac{(\Delta r)^2}{c^2}\big(1 + \frac{k}{r}\big)

which does not behave badly at r = k. In general relativity, this type of removable singularity is known as a coordinate singularity. Another example is the apparent singularity at the 90^{\circ} latitude in spherical coordinates, which disappears when a different coordinate system is used.

Since the term \big(1 - \frac{k}{r} \big) in the Schwarzschild metric becomes infinite at r = 0, it appears that we also have a singularity at this point. This is not a removable singularity and can in fact be recognised in terms of the earlier discussion above as a pole of order 1 (also called a simple pole).


Different possible branch cuts for the principal argument, principal logarithm and principal square root functions

BranchCutFor some work I was doing with a student, I was trying to find different ways of proving the familiar result that the complex square root function f(z) = \sqrt{z} is discontinuous everywhere on the negative real axis. As I was working on alternative proofs it became very clear to me how `sensitive’ all the proofs were to the particular definition of the principal argument I was using, namely that the principal argument \theta = \text{Arg}z is the unique argument of z satisfying -\pi < \theta \leq \pi. In a sense, this definition `manufactures’ the discontinuity of the complex square root function on the negative real axis, because the principal argument function itself is discontinuous here: the principal argument of a sequence of points approaching the negative real axis from above will tend to \pi, whereas the principal argument of a sequence approaching the same point on the negative real axis from below will tend to -\pi. I realised that all the proofs I was coming up with were exploiting this discontinuity of the principal argument function. However, this particular choice of principal argument function is completely arbitrary. An alternative could be to say that the principal argument of z is the unique argument satisfying 0 \leq \theta < 2\pi which we can call \text{Arg}_{2\pi} z. The effect of this choice of principal argument function is to make the complex square root function discontinuous everywhere on the positive real axis! It turns out that we can choose an infinite number of different lines to be lines of discontinuity for the complex square root function, simply by choosing different definitions of the principal argument function. The same applies to the complex logarithm function. In this note I want to record some of my thoughts about this.

The reason for having to specify principal argument functions in the first place is that we need to make complex functions of complex variables single-valued rather than multiple-valued, to make them well-behaved with regard to operations like differentiation. Specifying a principal argument function in order to make a particular complex function single-valued is called choosing a branch of the function. If we specify the principal argument function to be f(z) = \text{Arg} z where -\pi < \text{Arg} z \leq \pi then we define the principal branch of the logarithm function to be

\text{Log} z = \text{log}_e |z| + i \text{Arg} z

for z \in \mathbb{C} - \{0\}, and the principal branch of the square root function to be

z^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z \big)

for z \in \mathbb{C} with z \neq 0.

If we define the functions \text{Log} z and z^{\frac{1}{2}} in this way they will be single-valued, but the cost of doing this is that they will not be continuous on the whole of the complex plane (essentially because of the discontinuity of the principal argument function, which both functions `inherit’). They will be discontinuous everywhere on the negative real axis. The negative real axis is known as a branch cut for these functions. Using this terminology, what I want to explore in this short note is the fact that different choices of branch for these functions will result in different branch cuts for them.

To begin with, let’s formally prove the discontinuity of the principal argument function f(z) = \text{Arg} z, z \neq 0, and then see how this discontinuity is `inherited’ by the principal logarithm and square root functions. For the purposes of the proof we can consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}


\alpha \in \{ x \in \mathbb{R}: x < 0 \}

Clearly, as n \rightarrow \infty, we have z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = \text{Arg} \big( |\alpha| \text{e}^{(-\pi + 1/n)i}\big)

= -\pi + \frac{1}{n}

\rightarrow -\pi


f(\alpha) = \text{Arg}\big(|\alpha| \text{e}^{\pi i} \big) = \pi

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the negative real axis.

Now consider how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the negative real axis depends crucially on the discontinuity of \text{Arg} z. We again consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}


\alpha \in \{ x \in \mathbb{R}: x < 0 \}

so that z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (- \pi + \frac{1}{n}) \big)

\rightarrow |\alpha|^{\frac{1}{2}} \text{e}^{-i \pi /2} = - i |\alpha|^{\frac{1}{2}}


f(\alpha) = \big( |\alpha| \text{e}^{i \pi}\big)^{\frac{1}{2}}

= |\alpha|^{\frac{1}{2}} \text{e}^{i \pi/2} = i |\alpha|^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the negative real axis.

Now suppose we choose a different branch for the principal logarithm and square root functions, say \text{Arg}_{2\pi} z which as we said earlier satisfies 0 \leq \text{Arg}_{2\pi} z < 2\pi. The effect of this is to change the branch cut of these functions to the positive real axis! The reason is that the principal argument function will now be discontinuous everywhere on the positive real axis, and this discontinuity will again be `inherited’ by the principal logarithm and square root functions.

To prove the discontinuity of the principal argument function f(z) = \text{Arg}_{2\pi} z on the positive real axis we can consider the sequence of points

z_n = \alpha \text{e}^{(2 \pi - 1/n)i}


\alpha \in \{ x \in \mathbb{R}: x > 0 \}

We have z_n \rightarrow \alpha. However,

f(z_n) = \text{Arg}_{2\pi} \big(\alpha \text{e}^{(2\pi - 1/n)i}\big)

= 2\pi - \frac{1}{n}

\rightarrow 2\pi


f(\alpha) = \text{Arg}_{2\pi}(\alpha) = 0

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the positive real axis.

We can now again see how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the positive real axis depends crucially on the discontinuity of \text{Arg}_{2\pi} z there. We again consider the sequence of points

z_n = \alpha \text{e}^{(2\pi - 1/n)i}


\alpha \in \{ x \in \mathbb{R}: x > 0 \}

so that z_n \rightarrow \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg}_{2\pi} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (2 \pi - \frac{1}{n}) \big)

\rightarrow \alpha^{\frac{1}{2}} \text{e}^{i 2 \pi /2} = - \alpha^{\frac{1}{2}}


f(\alpha) = \alpha^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the positive real axis.

There are infinitely many other branches to choose from. In general, if \tau is any real number, we can define the principal argument function to be f(z) = \text{Arg}_{\tau} z where

\tau \leq \text{Arg}_{\tau} < \tau + 2\pi

and this will give rise to a branch cut for the principal logarithm and square root functions consisting of a line emanating from the origin and containing all those points z such that \text{arg}(z) = \tau modulo 2\pi.

A note on designing and implementing richer mathematical tasks

In this note I want to clarify for myself some aspects of designing and implementing richer mathematical tasks (i.e., tasks which involve more than just applying routine methods). For the purposes of discussion, closed tasks are taken to be those which have a clear goal and a unique answer; open tasks have a clear goal but more than one answer; investigative tasks are those which provide more scope for students to specify their own goals and research directions. These are to be distinguished from exercises which have one solution and a pre-specified solution method.

I have noticed numerous times in the past that many pupils suddenly become very engaged when I pose unusually challenging problems with unspecified and unclear (to them) methods of solution. These have typically been closed problems but they differed from the normal exercises pupils were used to. I have tried these problems on both low- and high-attaining groups and noticed a surprising degree of enthusiasm in both cases. For example, at the end of a lesson on substitution in algebra, I might put up a very complicated looking formula and ask the pupils what value of x would give the answer 4096 when substituted into it. After some initial expressions of puzzlement there will typically be a raised level of excitement and numerous suggestions will be made during animated discussions involving almost everyone in the class. Inevitably, someone will eventually find the correct answer by trial and error after some minutes. I have sometimes seen similar excitement among pupils faced with challenging problems in other mathematics teachers’ lessons.

In these situations, many pupils clearly became excited and intrigued by the more challenging tasks rather than being put off by them, including pupils who were usually difficult to motivate. I was sometimes amazed when solutions would be found much more quickly than I expected, or using interesting or unexpected approaches, or by unexpected pupils. What seemed to be happening was that I was suddenly enabling pupils to let their mathematical creativity and intuition run free over what was terra incognita for them. This contrasted with other parts of the lesson in which I had essentially taken out most of the fun by getting them to follow pre-specified rules in answering routine exercises. They seemed to feel liberated by being thrust into relatively unknown territory and clearly enjoyed it, even competing with each other to try to be the first work out an answer by any means possible. My realisation of this reminded me of a passage in a book by the physicist Leonard Susskind (2008, p.151) in which he ponders whether or not to include more demanding equations in his popular physics books. His publisher advised him not to, warning him that every additional equation would mean ten thousand fewer books sold. Susskind ignored the advice (and his books still became bestsellers), saying: “Frankly, that goes against my experience. People like to be challenged; they just don’t like to be bored.”

This issue is a relevant one because there seems to be an unnecessary divergence in practice between Ofsted’s desire to see pupils being stimulated with mathematically richer tasks in the classroom, and heads of departments’ (HoDs) desire to improve exam grades by ensuring pupils are well trained to answer short, closed, exam-type exercises. Often in practice the emphasis seems to be overwhelmingly on the latter, with one HoD describing his department to me as an ‘exam machine’. Teaching to the examination using routine closed exercises has been criticised by Ofsted (2009), and Ofsted instead praises schools (such as my old school – King Edwards VI Camp Hill School for Boys – see Ofsted, 2011) which provide pupils with more open and investigative tasks that require creative use of mathematics over extended periods. In this note I want to get a better understanding of the nature of different types of mathematical tasks, and the extent to which effective tasks can be designed which can simultaneously achieve both objectives.

Theories and views in previous literature

A large number of books, research articles and case studies have explored different aspects of enriching mathematics learning, and have documented the effects of different strategies on pupils. Hewitt (2002) introduced a useful distinction between arbitrary ideas in mathematics (those belonging to the realm of memorising things) and necessary ideas (those belonging to the realm of awareness). All students need to be informed of arbitrary ideas by someone else (e.g., that there are 360 degrees in a whole turn) but some students can then become ‘aware’ of necessary ideas by themselves (e.g., how many degrees there are in quarter-turns, half-turns, two-thirds turns, and so on). Hewitt argues that it is important for teachers to try to foster this ‘awareness’ rather than spoon-feeding necessary ideas as if they were arbitrary ones. This insight also has an important bearing on the design of mathematical tasks because it suggests that tasks involving more arbitrary ideas will involve more non-mathematical ‘memorising’ work by students. It would seem preferable in the limited amount of time available in maths lessons for pupils to be given tasks which allow them as much time as possible for exploring and finding out necessary ideas by themselves. It might be useful to classify mathematical tasks in terms of the extent to which they involve arbitrary vs necessary ideas, with those minimising the former and maximising the latter being preferred.

Another important reading for the purposes of understanding and classifying different types of mathematical tasks is Skovsmose’s (2002) Landscapes of investigation, which is the term he uses for learning environments which can support investigative work, such as project work. He contrasts this investigative paradigm with the traditional exercise paradigm of the typical maths lesson which involves closed problems to be solved by pre-specified methods. He also introduces a distinction in mathematical problems between references to pure mathematics, references to a semi-reality (a ‘virtual reality’ invented for the purposes of an exercise), and real-life references. He then defines six learning milieus in terms of the possible combinations of the two paradigms and three possible degrees of reference to ‘reality’, noting that the bulk of current mathematics education involves switching between pure maths and semi-real contexts within the exercise paradigm. Attempting to redress the balance are efforts like NRICH (http://nrich.maths.org) and Bowland maths (http://www.bowlandmaths.org.uk/), which now provide a wealth of investigative activities in pure maths, semi-real and real-life contexts. The matrix of learning milieus provides a useful classification scheme for mathematical tasks and also a useful analytical tool with which to assess the mix of activities in one’s teaching. Skovsmose advocates a type of mathematics education which moves between the six different milieus as appropriate, sometimes focusing on deep investigative work, sometimes on consolidation work using exercises. Although highly stimulating, Skovsmose’s ideas do not help much with the question of how to combine exam preparation with richer mathematical activities. Exam preparation would presumably fall under the exercise paradigm. Moreover, in trying to apply Skovsmose’s scheme to real-life situations, it could be argued that the exercise vs investigative distinction is too coarse, and that there are types of activity which lie between these two extremes. In particular, there are closed problems with unspecified solution methods which are not really routine exercises, and there are open problems with many possible answers which are nevertheless not quite landscapes of investigation. It might be useful to envisage a slightly more flexible version of Skovsmose’s classification scheme which includes exercises, closed problems, open problems and landscapes of investigation. When combined with Skovsmose’s three references to reality this gives twelve learning milieus.

Another useful classification scheme involves a three-dimensional cube (reproduced in Figure 1 below) as a model for integrating three dimensions of applications of mathematics: the context within which mathematics is to be used; the mathematical processes that are to be used; and the mathematical content (concepts, facts and techniques) that is to be used.

Figure 1. Three dimensions of applying mathematics (Westwell and Ward-Penny, 2011, p. 26)

The model was introduced by Westwell and Ward-Penny (2011, p.26) to analyse how the three dimensions of any use or application of mathematics are either treated separately or embedded in various versions of the national curriculum. For example, they pointed out that a major change in the 2008 mathematics NC was a separation of the second dimension (processes) from the third dimension (mathematical content) in order to emphasise the need to help pupils develop a wider set of thinking skills as well as their functional mathematics abilities. However, the same model can be used for the purpose of understanding the nature of different types of mathematical tasks. Indeed, the context dimension is recognisable as the reference dimension in Skovsmose’s matrix of learning milieus. The other dimensions allow for a refinement of the classification of mathematical tasks according to the processes and mathematical content involved, so together with Hewitt’s and Skovsmose’s contributions there are now five criteria for classifying tasks.

The different ways in which tasks can be presented is also an important aspect of mathematical task design that affords yet another ‘degree of freedom’ in thinking about how to vary and enrich the learning experience of pupils. This is explored by Mason and Johnston-Wilder (2006) who provide a book-length treatment of issues surrounding the design and use of mathematical tasks. In Chapter 1 of their book they introduce the term dimensions-of-possible-variation to describe the scope for varying the numbers and other features in a mathematical task without altering the underlying structure. The authors say: “Altering the numbers in a task is the most obvious of several dimensions-of-possible-variation that transform a task from a single exercise into a class of problems or a ‘problem type’. Learners make progress when they become aware that what matters about a task is the method and the thinking involved, rather than the specific numbers. When they begin to think about a problem type, they are starting to think mathematically about tasks as well as within tasks.” (The term metacognition is often used with regard to the last point). However, what makes the concept of dimensions-of-possible-variation particularly useful is that it can be applied not only to tasks themselves, but also to ways of presenting tasks. This adds another dimension to the ways in which mathematical tasks can be varied to make them more interesting and challenging. To illustrate, in Chapter 3 of the book the authors explore various dimensions-of-possible-variation in the presentation of a classic task known as arithmogons. This is a versatile activity that consists of a network of connected circles and squares and only a single rule to remember: the number in each square must be the sum of the numbers in the circles on either side of it. Various arrangements can be used, as illustrated in Figure 2 below.

Figure 2. Arithmogons (Mason and Johnston-Wilder, 2006, p. 44)

The authors discussed in detail a number of different ways of presenting this single task, including:

– doing and undoing calculations, as in Figure 2;
– starting from a non-school context to explain the activity (e.g., the authors suggested a story about an archaeological dig during which damaged clay tablets were discovered which had to be reconstructed using arithmogon-like techniques);
– distributed working and pooling resources;
– starting with a hard or complex version of an arithmogon, and encouraging pupils to make up simpler versions in order to ‘see how they work’ before tackling the more difficult one;
– asking pupils to say what is the same and what is different about two or three examples of arithmogons (a metacognition activity);
– starting in silence, i.e., the teacher completes a few examples of arithmogons in silence on the board, and then asks pupils to come up and offer similar examples;

These are all based on the same basic task, but the different ways of presenting it can provide different types of mathematical learning experiences for pupils. The concept of dimensions-of-possible-variation in the presentation of tasks is something to be borne in mind when assessing and classifying mathematical tasks, in addition to the five classification criteria identified earlier from Hewitt’s and Skovsmose’s insights and from the three-dimensional cube in Figure 1. Mason and Johnston-Wilder caution, however, that the degree of task variety that is provided in the classroom needs to be judged carefully. Too little variety in mathematical tasks can produce dependency (i.e., pupils find it difficult to deal with situations which are slightly different from the ‘norm’), but too much variety can be confusing and destabalizing. As Skovsmose (2002) indicates in his article, there needs to be an ongoing dialogue and negotiation in the classroom between teacher and pupils to decide what mix of activity types is most appropriate.

Yet another aspect of mathematical task design that needs to be considered, and that can be difficult to get right, is the appropriate level of difficulty of the tasks. Useful guidance about this is provided in the context of level differentiation by QIA (2007, p. 25). The article outlines four dimensions of mathematical tasks which determine their difficulty:

– complexity of the situation (e.g., number of variables, the variety and amount of data, the way in which the situation is presented, etc.);
– familiarity of the situation to the pupil (non-routine tasks are more demanding for pupils than routine activities they are familiar with);
– technical demand of the maths required to solve the problem (tasks which involve more sophisticated mathematics are more demanding for pupils than those which require only elementary mathematics);
– the extent to which the pupil is expected to tackle the problem independently (guidance from the teacher or from the structuring of a task into successive parts will make the task easier than if no such guidance is provided).

This a useful conceptual framework for mathematical task design, particularly for strategies involving modifying routine exercises to make them mathematically richer, while at the same time trying to avoid over-burdening pupils. For example, if a task is made more complex and non-routine and pupils are expected to work on it largely autonomously, it might be necessary to make the technical demands of the mathematics easier, to compensate for the other dimensions of difficulty. This framework thus provides a systematic way to adjust levels of difficulty, akin to turning four dials to ‘fine tune’ the level of difficulty appropriately. The level of difficulty provides a seventh criterion with which to classify different types of mathematical tasks.

A paper by Prestage and Perks (2007) is directly relevant to the question of how to combine exam preparation with more open and exploratory type work. It describes an approach for adapting and extending uninspiring tasks, such as short, closed exam-type questions, to make them mathematically richer. The approach involves:

(a) identifying the ‘givens’ in the task;
(b) changing, adding or removing a given;
(c) analysing the resulting maths including the choices for pupils and teachers;
(d) based on these explorations, choosing tasks which are appropriate for the classroom.

For example, Figure 3 below shows a typical question for work on algebraic substitution, with a range of suggestions for alternative tasks based on altering the ‘givens’. Prestage and Perks say: “Changing and removing givens reveals a wealth of different approaches to the mathematics and the potential for different but linked mathematics. The more you adapt and extend the more the different parts of the curriculum emerge…not as separate tasks but as continuous possibilities.” This approach seems promising as a technique for relatively quickly generating more interesting tasks for pupils as extensions of routine exam-type exercises, thus simultaneously satisfying the need both for exam preparation and for enriched mathematical learning.

Figure 3. Generating alternative tasks (Prestage and Perks, 2007, p. 386)

Although not discussed by Prestage and Perks, one can even envisage asking pupils themselves to generate alternative mathematical tasks using an approach like the one in Figure 3, thus making the exploratory experience student-led rather than teacher-led. Something similar to this had in fact been suggested already by Watson and Mason (2000). They advocated asking students to generate their own examples of such things as linear equations, alternative notations, and even their own problems for assessment purposes. In a powerful concluding paragraph, Watson and Mason say: “We believe that all students come to class with immense powers to construe, and that it is vital that the teacher uses tasks which call upon students to use those powers. Otherwise students may become trained in dependency on the teacher or text to do the thinking, the generalising and the particularising. Having to generate examples as part of learning about concepts is one way to avoid such dependency.”

Another paper in this vein by Cai and Brook (2006) also seems particularly helpful in terms of generating richer tasks for pupils. As well as advocating enhancing mathematical learning by asking students to pose new problems for themselves and to make generalisations, the paper also recommends asking students to generate, analyse and compare alternative solution approaches to problems. The authors say: “Regardless of whether there is individual or group effort, it is important for teachers to guide students to reflect and compare various solutions because the comparison helps students recognise similarities and differences between solutions and enhances students’ understanding.” This strategy is particularly appealing as a way of converting routine exercises or exam questions into more exploratory activities because the exercises do not even have to be modified. Simply asking students to find alternative methods of solution and to compare and contrast them can transform a routine exercise into a much more interesting task that is somewhat like the kind of activity undertaken by mathematical researchers, who often seek alternative proofs for theorems. I have found this strategy particularly useful in advanced work with high-attaining students, as indicated in my previous blog posts.

The use of the term ‘richer’ when referring to mathematical tasks is being used broadly in this note to mean tasks which involve more than just routine applications of pre-specified methods. Whether they are closed, open or investigative, such richer tasks can be much more challenging and stimulating for pupils than routine drill exercises. However, attempts have been made in the literature to characterise the term rich task more narrowly, and more along the lines of what are being referred to as investigative tasks in the present note. A particularly useful article in this regard by Ahmed (1987, p. 20) suggested that the following should be regarded as the characteristic features of a ‘rich task’:

– it must be accessible to everyone at the start;
– it needs to allow further challenges and be extendible;
– it should invite children to make decisions;
– it should involve children in: speculating, hypothesis making and testing, proving or explaining, reflecting, interpreting;
– it should not restrict children from searching in other directions;
– it should promote discussion and communication;
– it should encourage originality/invention;
– it should encourage ‘what if?’ and ‘what if not?’ questions;
– it should have an element of surprise;
– it should be enjoyable.

This is a useful list of criteria with which to assess the extent to which particular mathematical tasks are ‘rich’. Similar lists of criteria for identifying rich tasks have been produced by others, for example Jenny Piggot, ex-director of NRICH (see McDonald and Watson, 2010, p. 4).

Other types of tasks, particularly closed tasks, should not be undervalued as vehicles for enriching mathematical learning, however. It can be argued that it is overly simplistic to regard open tasks and rich tasks as somehow ‘better’ than closed tasks. As indicated at the start of this note, closed tasks can also offer opportunities for richer mathematical experiences. There is some support for this view in the literature. For example, Foster (2011) argues in favour of using what he calls ‘closed but provocative’ questions to generate richer mathematical experiences in the classroom. These are closed questions which give rise to unexpected (or otherwise interesting) results, and thereby invite further investigation and questioning. He gives an example of such a closed question in the context of curves which happen to enclose a unit area, something which he says tends to surprise students and which then leads to further discussion and experimentation. The closed questions which I have experimented with myself were made more interesting by having unspecified and initially unclear solution methods, or by asking pupils to find alternative solution methods which were then compared and contrasted.

An interesting discussion in Foster (2013) on resisting pressures for reductionism in mathematics pedagogy helps to clarify a lot of the ideas discussed so far about enriching mathematics learning by putting them into the context of reductionist vs holistic teaching approaches. Reductionism is an approach to problem solving which involves breaking problems down into smaller and more manageable parts. Holism is an opposing idea that suggests some problems cannot be solved by looking at the constituent parts separately because their interactions are what are important, not the parts themselves. Foster argues that mathematics pedagogy itself is becoming increasingly reductionist, and that while applying a reductionist approach to mathematical problems is often productive, applying the same approach to teaching mathematics can be bad for students. He recommends a more holistic approach to teaching maths, involving “genuine and substantial mathematical activities, which bring into play general mathematical strategies such as abstracting, representing, symbolizing, generalizing, proving, and formulating new questions.” (Foster, 2013, p. 577). The use of richer closed, open and investigative tasks can be regarded in the context of Foster’s article as a more holistic approach to teaching maths, and the enthusiastic response of many students to the challenging tasks discussed in the introduction can be interpreted as a response to this more holistic invitation to use their ingenuity and mathematical skills. Foster’s ideas also very much echo views expressed by Barnes (2002, p. 96) on how best to provide pupils with ‘magical moments’ of mathematical insight and discovery. Barnes says: “We might surmise, however, that ‘magical’ moments would be less likely to occur in expository teaching, where work on problems is preceded by carefully structured preparatory explanations and guided practice. If instruction progresses by small, simple steps, and the teacher anticipates difficulties and provides immediate clarification, students will have less need to struggle and less occasion to make efforts of their own to achieve understanding and insight.”

A number of articles have discussed issues relating to task design in the context of degree level mathematics, and it is interesting to observe that similar concerns arise as in secondary schools regarding the inadequacy of commonly used mathematical task strategies. For example, Breen and O’Shea (2010, 2011) survey the literature on the types of tasks assigned to university mathematics students and find that the majority require only imitative reasoning (i.e., reasoning requiring only memorisation or the use of well-rehearsed procedures) as opposed to creative reasoning (i.e., thoughtful novel reasoning backed up by suitable arguments from appropriate mathematical foundations). They express a familiar concern that “students have knowledge but are not in a position to use it in unfamiliar situations”, and recommend that lecturers should “scatter throughout a course a considerable number of problems for students to solve without first seeing very similar worked examples.”

A number of articles bring to light some of the challenges that teachers can face when trying to implement richer mathematical activities in the classroom. These challenges can come from both the school system and from pupils themselves who have often been ‘trained’ for years to think of school maths as having to be done in a particular way and feel uncomfortable when attempts are made to expose them to richer mathematical experiences. In arguing for more holistic approaches to mathematics teaching, Foster (2013) attributes the rise of reductionist teaching approaches to a combination of assessment-driven and accountability-driven cultures in schools, which have the effect of de-professionalizing teachers and forcing them to ‘teach to the examination’ in bite-sized chunks. Foster says: “The senior teacher who ‘pops in’ might have time only for a 15-minute visit, in which they will make judgements that can have serious consequences for the teacher. So an understandable defensive strategy for the teacher against these intrusions is to break up the lesson into episodes of no longer than 15 minutes, during each of which some superficial public student assessment takes place, which no observer can fail to miss, and which highlights what students have achieved during this period.” These pressures often make it seem too risky to implement recommendations from mathematics education research involving richer and less fragmented classroom activities. Similar pressures are revealed in an article by Ward-Penny (2010), who was challenged by a colleague to justify his use of an ‘unorthodox’ classroom activity involving the construction of a geodesic dome using cocktail sticks and soft sweets. Ward-Penny says: “At first I tried to bluster my way to an answer — different types of triangles are on the syllabus, aren’t they? The emphasis on processes in the new edition of the national curriculum might also serve as a cover-all — after all, aren’t the students exploring and justifying why the geodesic dome cannot be constructed solely with flat, equilateral triangles? If that attempt at hand-waving failed to convince, I am sure I could justify the activity to any passing inspector by uttering the magic term ‘cross-curricular’. Upon reflection, though, I decided that there is another, and far better reason for using this activity: it is both mathematical, and fun.” In an interesting article reflecting on her experiences of teaching mathematics in an exam-driven all-girls school, Hagan (2005) tells of similar challenges, but also highlights how pupils themselves can make it difficult to implement richer and more interesting mathematical activities. She says: “I believe that these students wanted to be taught quick, ‘easy to follow’ rules and procedures because success to them is not about understanding what they are doing in mathematics, but about knowing how to do a particular type of question in a particular way so that the marker will put that magical ‘A’ (for Achieved) in the space on the right hand side of the examination paper. Any grand ideas that I had of making their mathematics a connected and relevant experience were greeted with a total lack of enthusiasm from the students. What I had to remember was that some of these girls had been indoctrinated in the culture of the school so by the time I became their mathematics teacher I had many years of examination focus and pressure to battle against.”


Based on the above literature I feel that seven dimensions of variation in mathematical tasks come across as particularly important:

– the balance of arbitary versus necessary ideas (Hewitt, 2002);
– the overall task paradigm: exercise, closed, open, or landscape of investigation (Skovsmose, 2002);
– the degree of reference to reality: pure mathematics, semi-reality, real-life (Skovsmose, 2002);
– mathematical content: number, algebra, geometry and measures, handling data (Westwell and Ward-Penny, 2011);
– mathematical processes: making decisions, communicating, reasoning (Westwell and Ward-Penny, 2011);
– mode of presentation (Mason and Johnston-Wilder, 2004);
– level of difficulty: complexity, familiarity of the situation, technical demands of the mathematics involved, degree of teacher support (QIA, 2007).

I have also gained a fuller appreciation of how useful closed tasks can be for generating rich mathematical learning experiences. Foster (2011) pointed out that there is a current tendency to regard closed tasks as somehow inferior to open and investigative ones in this regard. I agree with him that this is an overly simplistic attitude. I am particularly interested in the potential usefulness of closed questions because modifying routine closed exam-type exercises to make them mathematically richer and more interesting seems to be the best hope for simultaneously achieving the objectives of preparing students for exams and enriching their mathematical learning. HoDs, as well as parents, want pupils to be prepared for exams and this reality has to be accommodated somehow. Papers by Prestage and Perks (2007), Watson and Mason (2000) and Cai and Brook (2006) came across as particularly useful in terms of suggesting realistic avenues for extending the utility of closed tasks. In the past I have managed to use closed tasks to great effect to enrich my pupils’ learning experiences, particularly in the case of high-attaining students with whom I explored alternative solution approaches for advanced maths exam questions along the lines recommended by Cai and Brook (2006). The students thereby obtained greater familiarity with the exam questions they would be facing as well as richer mathematical experiences which will stand them in good stead in the future.


Ahmed, A. (1987) Better Mathematics: a curriculum development study based on the low attainers in mathematics project, London, HMSO.

Barnes, M, 2002. ‘Magical’ moments in mathematics: insights into the process of coming to know. In: L. Haggarty (ed.), Teaching mathematics in secondary schools: a reader. RoutledgeFalmer, London.

Breen, S, O’Shea, A, 2010. Mathematical thinking and task design. Irish Math. Soc. Bulletin 66, pp. 39-49.

Breen, S, O’Shea, A, 2011. The use of mathematical tasks to develop mathematical thinking skills in undergraduate calculus courses – a pilot study. In: Smith, C. (Ed.) Proceedings of the British Society for Research into Learning Mathematics 31(1).

Cai, J., Brook, M., 2006, Looking back in problem solving. Mathematics Teaching
Incorporating Micromath, 196, pp. 42-45.

Foster, C, 2013. Resisting reductionism in mathematics pedagogy. The Curriculum Journal, 24:4, 563-585, DOI: 10.1080/09585176.2013.828630.

Foster, C, 2014. Closed but provocative questions: Curves enclosing unit area. International Journal of Mathematical Education in Science and Technology. [online] http://www.foster77.co.uk [accessed 22 May 2015].

Hagan, F, 2005. Reflections on teaching mathematics in an exam-driven school: an
autoethnography. [online] http://www.merga.net.au/documents/RP432005.pdf [Accessed 02 June 2015].

Hewitt, D, 2002. Arbitrary and necessary: A way of viewing the mathematics curriculum. In: L. Haggarty (ed.), Teaching mathematics in secondary schools: a reader. RoutledgeFalmer, London.

Mason, J, Johnston-Wilder, S, 2004. Designing and using mathematical tasks. Open University, Milton Keynes.

McDonald, S, Watson, A, 2010. What’s in a task? Generating mathematically rich activity. [online] http://xtec.cat/centres/a8005072/articles/rich.pdf [accessed 22 May 2015].

Ofsted, 2009. Understanding the score: Improving practice in mathematics (secondary). [online] http://www.ofsted.gov.uk/resources/mathematics-understanding-score-improving-practice-mathematics-secondary [accessed 20 May 2015].

Ofsted, 2011. Engaging able mathematics students: King Edwards VI Camp Hill School for Boys. [online] https://www.gov.uk/government/publications/engaging-more-able-mathematics-students [accessed 20 May 2015].

Prestage, S, Perks, P, 2007. Developing teacher knowledge using a tool for creating tasks for the classroom. Journal of Mathematics Teacher Education, 10, pp. 381-90.

QIA, 2007. Teaching and Learning Functional Mathematics. London, HMSO [online] https://learn2.open.ac.uk/pluginfile.php/709853/mod_resource/content/1/teaching_and_learning_functional_maths_3A.pdf [accessed 22 May 2015].

Skovsmose, O, 2002. Landscapes of investigation. In: L. Haggarty (ed.), Teaching mathematics in secondary schools: a reader. RoutledgeFalmer, London.

Susskind, L, 2008. The Black Hole War. Back Bay Books, New York.

Ward-Penny, R, 2010. Making a meal out of mathematics. Mathematics Teaching, 219, pp. 22-23.

Watson, A, Mason, J, 2000. Student generated examples. Mathematics Teaching, 172, pp. 59-62.

Westwell, J, Ward-Penny, R, 2011. Mathematics in the national curriculum. In: Johnston-Wilder, S, Johnston-Wilder, P, Pimm, D, and Lee, C (eds.), Learning to teach mathematics in the secondary school. Routledge, New York.

A note on alternative approaches to solving a variable mass problem involving accretion

hailstones Variable mass problems are the archetypal problems of `rocket science’ in which aspects of the motion of a rocket need to be calculated as the rocket burns up its fuel. Variable mass problems also arise in situations in which a body gains mass by accretion, e.g., the way a hailstone (pictured) grows by condensation as it falls. The calculations are based on the impulse-momentum principle which states that the change in linear momentum of a system in the time interval dt is equal to the impulse of the external forces acting on the system in that interval. This principle derives from Newton’s second law for systems of particles, expressed in terms of momentum (intuitively, F = \frac{dp}{dt} gives us dp = Fdt).

In this note I want to explore a variable mass problem involving a falling hailstone gaining mass by accretion, focusing on the derivation and solution of the differential equation of motion. I find the problem interesting because it is illustrative of the fact that it is sometimes possible to solve for the required quantities by direct integration without going through the step of first finding a general solution of the differential equation. However, going through the longer process of first finding a general solution of the differential equation enables further insights to be obtained. In this note I will solve the problem in both ways, i.e., by direct integration and by finding a general solution of the differential equation. I will use the latter to delve a bit deeper into the situation.

The problem is as follows (the reader should attempt to solve it before reading on):

A spherical hailstone is falling under gravity in still air. At time t the hailstone has speed v. The radius r increases by condensation at the rate \frac{dr}{dt} = kr where k is a constant. Ignoring air resistance, derive the equation of motion of the hailstone and find the time taken for the speed of the hailstone to increase from \frac{g}{9k} to \frac{g}{6k}.

It helps to draw a sketch of the situation, such as the following:


Using the impulse-momentum principle (dp = Fdt) and noting that the external force here is equal to mass times acceleration due to gravity, we can write

(m + dm)(v + dv) - (mv + 0dm) = (m + dm)gdt

which (treating products of differentials as zero) simplifies to

mdv + vdm = mgdt

Note that the sign of the acceleration due to gravity on the right hand side is positive because it is in the same direction as the velocity of the falling hailstone. Dividing through by dt gives

m\frac{dv}{dt} + v\frac{dm}{dt} = mg

At this point we can use the fact that mass is proportional to volume in this case, so

m = \lambda\frac{4}{3}\pi r^3

and therefore

\frac{dm}{dt} = \lambda 4 \pi r^2 \frac{dr}{dt} = \lambda 4 \pi r^3k


\lambda\frac{4}{3}\pi r^3 \frac{dv}{dt} + v \lambda 4 \pi r^3k = \lambda\frac{4}{3}\pi r^3g

which simplifies to

\frac{dv}{dt} + 3vk = g

This is the required equation of motion for the hailstone. As stated earlier, we can now proceed to find the time taken for the speed of the hailstone to increase from \frac{g}{9k} to \frac{g}{6k} either by direct integration, or by first solving the differential equation of motion to find an expression for v. The latter approach has the advantage that we can use the expression for v to obtain other insights. Using the direct integration approach, we would separate the variables and write

\int_0^T dt = \int_{\frac{g}{9k}}^{\frac{g}{6k}}\frac{dv}{g - 3vk}

and therefore the time taken for the speed of the hailstone to increase from \frac{g}{9k} to \frac{g}{6k} is

T = \big[-\frac{1}{3k}\ln(g-3vk)\big]_{\frac{g}{9k}}^{\frac{g}{6k}} = -\frac{1}{3k}\ln(\frac{3}{4})

To get the same answer by first solving the differential equation to obtain an expression for v, we would observe that the general solution of a first-order differential equation of the form

\frac{dy}{dx} + Py = Q


y = e^{-I}\int Qe^I dx + ce^{-I}


I = \int P dx

Here we have the differential equation

\frac{dv}{dt} + 3vk = g


I = \int 3k dt = 3kt

and therefore

v = e^{-3kt} \int g e^{3kt} dt + ce^{-3kt} = e^{-3kt} g \frac{1}{3k} e^{3kt} + ce^{-3kt} = \frac{g}{3k} + ce^{-3kt}

So the general solution is

v = \frac{g}{3k} + ce^{-3kt}

where c is an arbitrary constant.

Now, when v = \frac{g}{9k}, we have

\frac{g}{3k} + ce^{-3kt} = \frac{g}{9k}

and solving for t gives

t_1 = -\frac{1}{3k}\ln(-\frac{2g}{9ck})

Similarly, when v = \frac{g}{6k}, we have

\frac{g}{3k} + ce^{-3kt} = \frac{g}{6k}

and solving for t gives

t_2 = -\frac{1}{3k}\ln(-\frac{g}{6ck})

Then the time taken is

T = t_2 - t_1 = -\frac{1}{3k}\big[\ln(-\frac{g}{6ck}) - \ln(-\frac{2g}{9ck})\big] = -\frac{1}{3k}\ln(\frac{3}{4})

as before. Note that the arbitrary constant c disappears in the course of the calculation.

Although this latter approach seems more long-winded, it has the advantage that we can now use the general solution of the differential equation we have obtained to explore other features of the problem. For example, although it is not specifically required in the original problem, we could work out how far the hailstone travelled in the time it took for its speed to increase from \frac{g}{9k} to \frac{g}{6k}. To do this, we would rewrite the general solution of the differential equation obtained above as

\frac{dx}{dt} = \frac{g}{3k} + ce^{-3kt}

Separating the variables and integrating on both sides then gives the following expression for the position of the hailstone as a function of time:

x(t) = \frac{gt}{3k} - \frac{c}{3k}e^{-3kt} + d

where d is a second arbitrary constant. At the time t_1 calculated above we have

x(t_1) = -\frac{g}{(3k)^2}\ln(-\frac{2g}{9ck}) + \frac{c}{3k}\frac{2g}{9ck} + d

= -\frac{g}{(3k)^2}\ln(-\frac{2g}{9ck}) + \frac{2}{3}\frac{g}{(3k)^2} + d

and at the time t_2 calculated above we have

x(t_2) = -\frac{g}{(3k)^2}\ln(-\frac{g}{6ck}) + \frac{c}{3k}\frac{g}{6ck} + d

= -\frac{g}{(3k)^2}\ln(-\frac{g}{6ck}) + \frac{1}{2}\frac{g}{(3k)^2} + d

Therefore the distance travelled by the hailstone in the time it took for its speed to increase from \frac{g}{9k} to \frac{g}{6k} is

X = x(t_2) - x(t_1) = -\frac{g}{(3k)^2}\big(\frac{1}{6} + \ln(\frac{3}{4})\big)

Note that both the arbitrary constants c and d have disappeared in the course of the calculation.

Alternative approaches to solving advanced level vector problems

Trying to find more than one solution for a given problem can be an effective way to convert routine exercises or exam questions into exploratory `research’ type activities which are more open-ended and fun. Getting students to generate, analyse and compare alternative solutions to problems has long been recommended in the mathematics education literature (see, e.g., Cai, J., Brook, M., 2006, Looking back in problem solving, Mathematics Teaching Incorporating Micromath, 196, pp. 42-45). I recently explored the potential for this approach in the context of advanced level vector problems and want to document some alternative approaches I found for answering some of these problems here. A typical advanced level vector problem – in fact, an exam question – is as follows (the reader should attempt this question before reading on):

With respect to a fixed origin O, the line l has equation

\mathbf{r} = \begin{pmatrix}13\\8\\1\end{pmatrix} + \lambda\begin{pmatrix}2\\2\\-1\end{pmatrix}

where \lambda is a scalar parameter. The point A lies on l and has coordinates (3, -2, 6).
The point P has position vector (-p \mathbf{i} + 2p \mathbf{k}) relative to O where p is a constant.
Given that vector \overrightarrow{PA} is perpendicular to l,
(a) find the value of p.

Given also that B is a point on l such that \angle BPA = 45^{\circ},
(b) find the coordinates of the two possible positions of B.

The following is a sketch of the scenario.


Let \overrightarrow{OA} = \mathbf{a}, \overrightarrow{OB} = \mathbf{b}, and \overrightarrow{OP} = \mathbf{p}. The interesting part of this question is (b), but with regard to (a) we have

\overrightarrow{PA} = \mathbf{a} - \mathbf{p} = \begin{pmatrix}3\\-2\\6\end{pmatrix} - \begin{pmatrix}-p\\\text{0}\\2p\end{pmatrix} = \begin{pmatrix}3+p\\-2\\6-2p\end{pmatrix}

Since \overrightarrow{PA} is perpendicular to l, the dot product with the direction vector of l must be zero, so we must have

\begin{pmatrix}3+p\\-2\\6-2p\end{pmatrix} \cdot \begin{pmatrix}2\\2\\-1\end{pmatrix} = 6 + 2p - 4 - 6 + 2p = 0

so p = 1. It follows that

\mathbf{p} = \begin{pmatrix}-p\\\text{0}\\2p\end{pmatrix} = \begin{pmatrix}-1\\\text{0}\\2\end{pmatrix}


\overrightarrow{PA} = \begin{pmatrix}3+p\\-2\\6-2p\end{pmatrix} = \begin{pmatrix}4\\-2\\4\end{pmatrix}

With regard to part (b), the most straightforward approach is to observe that since \overrightarrow{AB} is collinear with l, the coordinates of B with respect to O must be given by

\mathbf{b} = \mathbf{a} + \overrightarrow{AB} = \begin{pmatrix}3\\-2\\6\end{pmatrix} + \mu\begin{pmatrix}2\\2\\-1\end{pmatrix}

where \mu is a parameter to be found, and where the length of \overrightarrow{AB} is equal to the length of \mathbf{b} - \mathbf{a} = \mu\begin{pmatrix}2\\2\\-1\end{pmatrix}, which is |\mu| \sqrt{2^2 + 2^2 + (-1)^2} = 3|\mu|. Now, |\overrightarrow{PA}| = \sqrt{4^2 + (-2)^2 + 4^2} = \sqrt{36} = 6 and since the right-angled triangle in the sketch above is isosceles we must also have |\overrightarrow{AB}| = 3|\mu| = 6 and so |\mu| = 2. Substituting \mu = 2 into the expression for \mathbf{b} above we get the coordinates of B in the sketch above to be (7, 2, 4). However, the sketch above also shows another possible position for B, to the left of A. By symmetry, the coordinates of this other possible position for B can be found by setting \mu = -2 (since in this case the vector \overrightarrow{AB} would point to the left of A rather than to the right). Substituting \mu = -2 into the expression for \mathbf{b} above we get the other possible coordinates of B to be (-1, -6, 8). This solves part (b).

The question now arises as to whether there is any other way of solving part (b) of this problem? I found the following alternative solution approach which I think is instructive (although algebraically more complicated, perhaps) in that it uses the cosine expression for the scalar product. Since the angle between \overrightarrow{PA} and \overrightarrow{PB} is 45^{\circ}, we must have

\overrightarrow{PA} \cdot \overrightarrow{PB} = |\overrightarrow{PA}||\overrightarrow{PB}|\cos 45^{\circ} = \frac{1}{\sqrt{2}}|\overrightarrow{PA}||\overrightarrow{PB}|

Since B is on the line l, it must have coordinates of the form (13+2\theta, 8+2\theta, 1-\theta). As we found in part (a) above, we have (in row vector form)

\overrightarrow{PA} = (4, -2, 4)


\overrightarrow{PB} = (13+2\theta, 8+2\theta, 1-\theta) - (-1, 0, 2) = (14+2\theta, 8+2\theta, -1-\theta)


\overrightarrow{PA} \cdot \overrightarrow{PB} = 56 + 8\theta - 16 - 4\theta - 4 - 4\theta = 36

but also

\overrightarrow{PA} \cdot \overrightarrow{PB} = \frac{1}{\sqrt{2}}|\overrightarrow{PA}||\overrightarrow{PB}| = \frac{1}{\sqrt{2}}\cdot6\sqrt{(14+2\theta)^2 + (8+2\theta)^2 + (1+\theta)^2} = 3\sqrt{2}\sqrt{9\theta^2 + 90\theta + 261}

Equating these two we get

3\sqrt{2}\sqrt{9\theta^2 + 90\theta + 261} = 36

which simplifies to

\theta^2 + 10\theta + 21 = 0


(\theta + 3)(\theta + 7) = 0

The solutions to the quadratic are therefore \theta = -3 and \theta = -7. Since B must have coordinates of the form (13+2\theta, 8+2\theta, 1-\theta), substituting the two possible solutions for \theta gives us the two possible coordinates of B as (7, 2, 4) and (-1, -6, 8) as before.

I performed similar exercises with other advanced level vector problems and was able to find alternative solution approaches which were usually instructive in some way, although also usually more algebraically demanding than the most straightforward approach available. Another example I want to record here concerns the following advanced level vector problem – also an exam question (again, the reader should attempt to solve this problem before reading my discussion of it below):

With respect to a fixed origin O, the lines l_1 and l_2 are given by the equations

l_1: \mathbf{r} = \begin{pmatrix}6\\-3\\-2\end{pmatrix} + \lambda\begin{pmatrix}-1\\2\\3\end{pmatrix}


l_2: \mathbf{r} = \begin{pmatrix}-5\\15\\3\end{pmatrix} + \mu\begin{pmatrix}2\\-3\\1\end{pmatrix}

where \lambda and \mu are scalar parameters.

(a) Show that l_1 and l_2 meet and find the position vector of their point of intersection A.

(b) Find, to the nearest 0.1^{\circ}, the acute angle between l_1 and l_2.

The point B has position vector \begin{pmatrix}5\\-1\\1\end{pmatrix}

(c) Show that B lies on l_1.

(d) Find the shortest distance from B to the line l_2.

The most interesting part of this question for me is part (d), but for part (a) we observe that since the two lines meet, it must be the case that 6 - \lambda = -5 + 2\mu or equivalently

2\mu + \lambda = 11

and also -3 + 2\lambda = 15 - 3\mu or equivalently

3\mu + 2\lambda = 18

Solving these two simultaneously gives \lambda = 3 and \mu = 4. If the two lines do indeed meet, these values for \lambda and \mu will be consistent with equality of the third coordinates of l_1 and l_2 at the point of intersection. Using \lambda = 3 to find the third coordinate for l_1 and using \mu = 4 to find the third coordinate for l_2, we find that the third coordinate is 7 in both cases, so the two lines do indeed meet. The position vector of their point of intersection A is then

\begin{pmatrix}6\\-3\\-2\end{pmatrix} + 3\begin{pmatrix}-1\\2\\3\end{pmatrix} = \begin{pmatrix}3\\3\\7\end{pmatrix}

or equivalently

\begin{pmatrix}-5\\15\\3\end{pmatrix} + 4\begin{pmatrix}2\\-3\\1\end{pmatrix} = \begin{pmatrix}3\\3\\7\end{pmatrix}

For part (b) we find the dot product of the direction vectors of the two lines. We get

(-1, 2, 3) \cdot (2, -3, 1) = -5

but also

(-1, 2, 3) \cdot (2, -3, 1) = |(-1, 2, 3)||(2, -3, 1)|\cos\theta = 14\cos\theta

Equating the two gives

\cos\theta = \frac{-5}{14}

so \theta = \arccos(\frac{-5}{14}) = 110.9248324^{\circ}, and therefore the acute angle must be 180^{\circ} - \theta = 69.1^{\circ} (to nearest 0.1^{\circ}).

For part (c) we observe that if B lies on l_1, there is a \lambda such that

\begin{pmatrix}5\\-1\\1\end{pmatrix} = \begin{pmatrix}6-\lambda\\-3+2\lambda\\-2+3\lambda\end{pmatrix}

By inspection, the required value is \lambda = 1.

Finally, for part (d) it is helpful to draw the following sketch of the situation:


We know that point A has coordinates (3, 3, 7) and point B has coordinates (5, -1, 1), and therefore the length of the line from B to A is

|\overrightarrow{BA}| = |(3, 3, 7) - (5, -1, 1)| = |(-2, 4, 6)| = \sqrt{(-2)^2 + 4^2 + 6^2} = \sqrt{56}

The shortest distance from B to the line l_2 is the length of the perpendicular from B which is shown in the sketch as intersecting line l_2 at a point C. Therefore the required shortest distance is |\overrightarrow{BC}| which can be obtained from simple trigonometry as

|\overrightarrow{BC}| = |\overrightarrow{BA}|\sin69.1^{\circ} = \sqrt{56}\sin69.1^{\circ} = 6.99

This solves part (d). Again, we ask if there is some alternative way to obtain this shortest distance? I found the following alternative approach which avoids using trigonometry (at the expense of being algebraically more cumbersome). A point on the line l_2 must have coordinates of the form

(-5 + 2\mu, 15 - 3\mu, 3 + \mu)

The distance between B and any such point is given by

|(5, -1, 1) - (-5 + 2\mu, 15 - 3\mu, 3 + \mu)| = |(10 - 2\mu, -16 + 3\mu, -2 - \mu)|

= \sqrt{(10 - 2\mu)^2 + (-16 + 3\mu)^2 + (-2 - \mu)^2}

= \sqrt{14\mu^2 - 132\mu + 360}

= \sqrt{14(\mu - \frac{33}{7})^2 + \frac{342}{7}}

Therefore the shortest distance is when \mu = \frac{33}{7} and is \sqrt{\frac{342}{7}} = 6.99, which agrees with the previous result.

Polygonal numbers as quadratic sequences

I was observing a maths lesson about sequences in a sixth form college. This was pitched at a fairly low level focusing primarily on arithmetic sequences, but a student was having a go at some challenge problems and got stuck on the problem of finding the nth term of the sequence 1, 3, 6, 10, 15, … The teacher asked me to help him. This is a quadratic sequence with first differences 2, 3, 4, 5, … and second differences 1, 1, 1, 1, …. The procedure for quadratic sequences is to divide the second difference by two and use that as the coefficient of n^2. Then compare this resulting sequence with the original one and deduce the other terms. Here the answer is \frac{1}{2}(n^2 + n).

After the lesson I continued to think about this problem and realised that the sequence 1, 3, 6, 10, 15, … is the sequence of triangular numbers. The Pythagoreans associated certain sequences of numbers with polygons, so we have triangular numbers, square numbers, pentagonal numbers, hexagonal numbers, etc. The relationship between these numbers and their associated polygons is clearer to see when the numbers are represented as dots, as in the following diagram:


I noticed that the next polygonal numbers, the square numbers, are also a quadratic sequence, with nth term given by n^2. I wondered if all the polygonal numbers are quadratic sequences.

I could see that the pentagonal numbers 1, 5, 12, 22, 35, 51, … also form a quadratic sequence, with first differences 4, 7, 10, 13, 16, … and second differences 3, 3, 3, 3, … Dividing the second difference by 2 we see that the nth term of the sequence must involve \frac{3}{2}n^2. The following table then reveals what the remaining term must be:


Therefore the nth term of the pentagonal number sequence is of the form \frac{1}{2}(3n^2 - n).

It turns out that all the polygonal number sequences are quadratic sequences. To prove this to myself and derive a general formula, I produced the following table in excel showing the first 20 terms of polygonal number sequences from the triangular to the icosagonal (20-sided polygon):


Looking at this table, I noticed that the triangular number sequence which the student had asked me about in the lesson I observed is actually the key to deriving a general formula for the nth term of any polygonal number sequence. Specifically, the table shows that the nth term for the p-gonal sequence is just the nth term for the triangular sequence plus (p - 3) times the (n - 1)th term of the triangular sequence.

For example, the 9th term (n = 9) for the hexagonal sequence (p = 6) is

\frac{1}{2}(9^2 + 9) + (6 - 3)\frac{1}{2}(8^2 + 8) = 45 + 108 = 153

In general, the nth term for the p-gonal sequence is

\frac{1}{2}(n^2 + n) + (p - 3)\frac{1}{2}((n - 1)^2 + (n - 1))

which simplifies to

\frac{n^2(p - 2) - n(p - 4)}{2}