A problem involving the use of exterior derivatives of differential forms to re-express the classical gradient, curl and divergence operations

CartanModern differential geometry makes extensive use of differential forms and the concept of exterior derivatives of differential forms developed by the French mathematician Élie Cartan (1869-1951). A Wikipedia article about exterior derivatives of differential forms can be found here. As alluded to in this article, exterior derivatives of differential forms encompass a lot of results usually expressed in terms of vector fields in classical vector calculus. In particular, there is a duality between 1-forms, 2-forms and vector fields which allows the classical gradient, curl and divergence operations of vector calculus to be fully subsumed within the realm of exterior derivatives. In the present note I want to briefly explore how these three differentiation operations of vector calculus can be replaced with Cartan’s exterior derivative. The necessary notation and motivation for this are nicely encapsulated in the following problem which appears in Barrett O’Neill’s Elementary Differential Geometry book (Revised Second Edition, p.33):

diffgeomproblem

This problem was also the subject of an interesting Mathematics Stack Exchange discussion which can be found here. The reader should attempt to solve this problem by himself/herself before reading my solution below.

To solve part (a), we use the fact that if f is a differentiable real-valued function on \mathbb{R}^3 and \bold{v}_p is a tangent vector with point of application \bold{p} and vector part \bold{v}, then the differential df of f is the 1-form such that

df(\bold{v}_p) = \sum v_i \frac{\partial f}{\partial x_i}(\bold{p}) = \sum \frac{\partial f}{\partial x_i}(\bold{p}) dx_i(\bold{v}_p)

(where the last equality uses the fact that the differentials of the natural coordinate functions evaluated at a tangent vector are equal to the the coordinates v_i of the vector part of the tangent vector). But using the correspondence (1) between 1-forms and vector fields in the problem we can then write

df(\bold{v}_p) = \sum \frac{\partial f}{\partial x_i}(\bold{p}) dx_i(\bold{v}_p) \stackrel{\mathrm{(1)}}{\longleftrightarrow} \sum \frac{\partial f}{\partial x_i}(\bold{p}) U_i(\bold{p}) = \text{grad } f(\bold{p})

(where the U_i(\bold{p}) are the natural frame field vectors at the point of application \bold{p}). Therefore we have shown that

df \stackrel{\mathrm{(1)}}{\longleftrightarrow} \text{grad } f

I emphasised a specific tangent vector argument \bold{v}_p in the above solution but I will not do this in the solutions for (b) and (c) as the notation becomes too cumbersome. To solve part (b), we consider the 1-form

\phi = f_1 dx_1 + f_2 dx_2 + f_3 dx_3

The exterior derivative of \phi is the 2-form

d \phi = df_1 \wedge dx_1 + df_2 \wedge dx_2 + df_3 \wedge dx_3

=

\big(\frac{\partial f_1}{\partial x_1} dx_1 + \frac{\partial f_1}{\partial x_2} dx_2 + \frac{\partial f_1}{\partial x_3} dx_3 \big) \wedge dx_1

+ \big(\frac{\partial f_2}{\partial x_1} dx_1 + \frac{\partial f_2}{\partial x_2} dx_2 + \frac{\partial f_2}{\partial x_3} dx_3 \big) \wedge dx_2

+ \big(\frac{\partial f_3}{\partial x_1} dx_1 + \frac{\partial f_3}{\partial x_2} dx_2 + \frac{\partial f_3}{\partial x_3} dx_3 \big) \wedge dx_3

=

-\frac{\partial f_1}{\partial x_2} dx_1 dx_2 - \frac{\partial f_1}{\partial x_3} dx_1 dx_3

+ \frac{\partial f_2}{\partial x_1} dx_1 dx_2 - \frac{\partial f_2}{\partial x_3} dx_2 dx_3

+ \frac{\partial f_3}{\partial x_1} dx_1 dx_3 + \frac{\partial f_3}{\partial x_2} dx_2 dx_3

= \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) dx_1 dx_2 + \big( \frac{\partial f_3}{\partial x_1} - \frac{\partial f_1}{\partial x_3} \big) dx_1 dx_3 + \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) dx_2 dx_3

But using the correspondence (2) between 2-forms and vector fields in the problem we can then write

d \phi = \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) dx_1 dx_2 + \big( \frac{\partial f_3}{\partial x_1} - \frac{\partial f_1}{\partial x_3} \big) dx_1 dx_3 + \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) dx_2 dx_3

\stackrel{\mathrm{(2)}}{\longleftrightarrow}

= \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) U_1 + \big( \frac{\partial f_1}{\partial x_3} - \frac{\partial f_3}{\partial x_1} \big) U_2 + \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) U_3

= \text{curl } V

Therefore we have shown that

d \phi \stackrel{\mathrm{(2)}}{\longleftrightarrow} \text{curl } V

Finally, to solve part (c) we can consider the 2-form

\eta = f_1 dydz + f_2 dx dz + f_3 dx dy

which has a correspondence with the vector field V = \sum f_i U_i of the type (1) in the problem, that is,

\eta \stackrel{\mathrm{(1)}}{\longleftrightarrow} V

The exterior derivative of \eta is the 3-form

d \eta = df_1 \wedge dy dz + df_2 \wedge dx dz + df_3 \wedge dx dy

Since products of differentials containing the same differential twice are eliminated, we see immediately that this reduces to

d \eta = \big( \frac{\partial f_1}{\partial x} dx \big) dy dz + \big( \frac{\partial f_2}{\partial y} dy \big) dx dz + \big( \frac{\partial f_3}{\partial z} dz \big) dx dy

= \big(\frac{\partial f_1}{\partial x} + \frac{\partial f_2}{\partial y} + \frac{\partial f_3}{\partial z} \big) dx dy dz

= (\text{div } V) dx dy dz

 

 

 

 

 

Proof that a Gauss sum associated with a quadratic character mod p is the same as a quadratic Gauss sum

In this note I want to quickly record a solution I have found to the following problem: If p is an odd prime such that p \nmid n and \chi(r) = (r|p), prove that

G(n, \chi) = \sum_{r \text{mod } p} (r|p) e^{2 \pi i n r/p} = \sum_{r = 1}^p e^{2 \pi i n r^2/p} = G(n ; p)

My solution is as follows. We have

G(n, \chi) = \sum_{r \text{mod } p} (r|p) e^{2 \pi i n r/p}

= \sum_{\substack{\text{quadratic}\\\text{residues }r}} e^{2 \pi i n r/p} - \sum_{\substack{\text{quadratic}\\\text{non-residues }s}} e^{2 \pi i n s/p}

whereas

G(n ; p) = \sum_{r = 1}^p e^{2 \pi i n r^2/p}

= \sum_{r = 1}^{p-1} e^{2 \pi i n r^2/p} + 1

= 2\sum_{\substack{\text{quadratic}\\\text{residues }r}} e^{2 \pi i n r/p} + 1

(since r mod p will range over both quadratic residues and non-residues, so r^2 will range over the quadratic residues twice)

= \sum_{\substack{\text{quadratic}\\\text{residues }r}} e^{2 \pi i n r/p}

+ \big \{ \sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + 1 \big \}

Therefore G(n, \chi) = G(n ;  p) if and only if

- \sum_{\substack{\text{quadratic}\\\text{non-residues }s}} e^{2 \pi i n s/p}

= \sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + 1

\iff

\sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + \sum_{\substack{\text{quadratic}\\ \text{non-residues }s}} e^{2 \pi i n s/p} = -1

But this is true because

\sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + \sum_{\substack{\text{quadratic}\\ \text{non-residues }s}} e^{2 \pi i n s/p} = \sum_{r=1}^{p-1} e^{2 \pi inr/p}

This is a geometric sum of the form

S = \sum_{r=1}^{p-1} x^r = x + x^2 + \cdots + x^{p-1}

where

x = e^{2 \pi in/p}

Therefore

xS = x^2 + x^3 + \cdots + x^p

Subtracting the expression for S from this we get

(x - 1)S = x^p - x = 1 - x

(the last equality is true because x is a p-th root of unity)

\iff

S = -1

Therefore G(n, \chi) = G(n ; p).

On the classification of singularities, with an application to non-rotating black holes

singularity In mathematics a singularity is a point at which a mathematical object (e.g., a function) is not defined or behaves `badly’ in some way. Singularities can be isolated (e.g., removable singularities, poles and essential singularities) or nonisolated (e.g., branch cuts). For teaching purposes, I want to delve into some of the mathematical aspects of isolated singularities in this note using simple examples involving the complex sine function. I will not consider nonisolated singularities in detail. These are briefly discussed with some examples in this Wikipedia page. I will also briefly look at how singularities arise in the context of black hole physics in a short final section.

punctured

Definition: A function f has an isolated singularity at the point \alpha if f is analytic on a punctured open disc \{z: 0 < |z - \alpha| < r \}, where r > 0, but not at \alpha itself.

Note that a function f is analytic at a point \alpha if it is differentiable on a region containing \alpha. Strangely, a function can have a derivative at a point without being analytic there. For example, the function f(z) = |z|^2 has a derivative at z = 0 but at no other point, as can easily be verified using the Cauchy-Riemann equations. Therefore this function is not analytic at z = 0. Also note with regard to the definition of an isolated singularity that the function MUST be analytic on the `whole’ of the punctured open disc for the singularity to be defined. For example, despite appearances, the function

f(z) = \frac{1}{\sqrt{z}}

does not have a singularity at z = 0 because it is impossible to define a punctured open disc centred at 0 on which f(z) is analytic (the function z \rightarrow \sqrt{z} is discontinuous everywhere on the negative real axis, so f(z) fails to be analytic there).

I find it appealing that all three types of isolated singularity (removable, poles and essential singularities) can be illustrated by using members of the following family of functions:

f(z) = \frac{\sin(z^m)}{z^n}

where m, n \in \mathbb{N}. For example, if m = n = 1 we get

f_1(z) = \frac{\sin(z)}{z}

which has a removable singularity at z = 0. If m = 1, n = 3 we get

f_2(z) = \frac{\sin(z)}{z^3}

which has a pole of order 2 at z = 0. Finally, if m = -1, n = 0 we get

f_3(z) = \sin\big( \frac{1}{z} \big)

which has an essential singularity at z = 0. In each of these three cases, the function is not analytic at z = 0 but is analytic on a punctured open disc with centre 0, e.g., \{z: 0 < |z| < 1\} or indeed \mathbb{C} - \{0\} (which can be thought of as a punctured disc with infinite radius). In what follows I will use these three examples to delve into structural definitions of the three types of singularity. I will then explore their classification using Laurent series expansions.

Structural definitions of isolated singularities

Removable singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a removable singularity at \alpha if there is a function g which is analytic at \alpha such that

f(z) = g(z) for 0 < |z - \alpha| < r

We can see that g extends the analyticity of f to include \alpha, so we say that g is an analytic extension of f to the circle

\{z: |z - \alpha| < r \}

With removable singularities we always have that \lim_{z \rightarrow \alpha} f(z) exists since

\lim_{z \rightarrow \alpha} f(z) = g(\alpha)

(this will not be true for the other types of singularity) and the name of this singularity comes from the fact that we can effectively `remove’ the singularity by defining f(\alpha) = g(\alpha).

To apply this to the function

f_1(z) = \frac{\sin(z)}{z}

we first observe that the Maclaurin series expansion of \sin(z) is

\sin(z) = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \frac{z^7}{7!} + \cdots for z \in \mathbb{C}

Therefore we can write

f_1(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C} - \{0\}

If we then set

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

we see that g(z) extends the analyticity of f_1(z) to include z = 0. We also see that

\lim_{z \rightarrow 0} f_1(z) = g(0)

Therefore f_1(z) has a removable singularity at z = 0.

Poles of order k, k > 0

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a pole of order k at \alpha if there is a function g, analytic at \alpha with g(\alpha) \neq 0, such that

f(z) = \frac{g(z)}{(z - \alpha)^k} for 0 < |z - \alpha| < r

With poles of order k we always have that

f(z) \rightarrow \infty as z \rightarrow \alpha

(which distinguishes them from removable singularities)

and

\lim_{z \rightarrow \alpha} (z - \alpha)^k f(z)

exists and is nonzero (since \lim_{z \rightarrow \alpha} (z - \alpha)^k f(z) = g(\alpha) \neq 0).

To apply this to the function

f_2(z) = \frac{\sin(z)}{z^3}

we first observe that

f_2(z) = \frac{\sin(z)/z}{z^2} = \frac{g(z)}{z^2} for z \in \mathbb{C} - \{0\}

where g is the function

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

Since g(0) = 1 > 0, we see that f_2(z) behaves like \frac{1}{z^2} near z = 0 and

f_2(z) \rightarrow \infty as z \rightarrow 0

so the singularity at z = 0 is not removable. We also see that

\lim_{z \rightarrow 0} z ^2 f_2(z) = g(0) = 1

Therefore the function f_2(z) has a pole of order 2 at z = 0.

Essential singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has an essential singularity at \alpha if the singularity is neither removable nor a pole. Such a singularity cannot be removed in any way, including by mutiplying by any (z - \alpha)^k, hence the name.

With essential singularities we have that

\lim_{z \rightarrow \alpha} f(z)

does not exist, and f(z) does not tend to infinity as z \rightarrow \alpha.

To apply this to the function

f_3(z) = \sin\big( \frac{1}{z}\big)

we observe that if we restrict the function to the real axis and consider a sequence of points

z_n = \frac{2}{(2n + 1) \pi}

then we have that z_n \rightarrow 0 whereas

f_3(z_n) = \sin\big(\frac{(2n + 1) \pi}{2}\big) = (-1)^n

Therefore

\lim_{z \rightarrow 0} f_3(z)

does not exist, so the singularity is not removable, but it is also the case that

\lim_{z \rightarrow 0} f_3(z) \not \rightarrow \infty

so the singularity is not a pole. Since it is neither a removable singularity nor a pole, it must be an essential singularity.

Classification of isolated singularities using Laurent series

By Laurent’s Theorem, a function f which is analytic on an open annulus

A = \{z: 0 \leq r_1 < |z - \alpha| < r_2 \leq \infty \}

annulus

(shown in the diagram) can be represented as an extended power series of the form

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

= \cdots + \frac{a_{-2}}{(z - \alpha)^2} + \frac{a_{-1}}{(z - \alpha)} + a_0 + a_1 (z - \alpha) + a_2 (z - \alpha)^2 + \cdots

for z \in A, which converges at all points in the annulus. It is an `extended’ power series because it involves negative powers of (z - \alpha). (The part of the power series involving negative powers is often referred to as the singular part. The part involving non-negative powers is referred to as the analytic part). This extended power series representation is the Laurent series about \alpha for the function f on the annulus A. Laurent series are also often used in the case when A is a punctured open disc, in which case we refer to the series as the Laurent series about \alpha for the function f.

The Laurent series representation of a function on an annulus A is unique. We can often use simple procedures, such as finding ordinary Maclaurin or Taylor series expansions, to obtain an extended power series and we can feel safe in the knowledge that the power series thus obtained must be the Laurent series.

Laurent series expansions can be used to classify singularities by virtue of the following result: If a function f has a singularity at \alpha and if its Laurent series expansion about \alpha is

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

then

(a) f has a removable singularity at \alpha iff a_n = 0 for all n < 0;

(b) f has a pole of order k at \alpha iff a_n = 0 for all n < -k and a_{-k} \neq 0;

(c) f has an essential singularity at \alpha iff a_n \neq 0 for infinitely many n < 0.

To apply this to our three examples, observe that the function

f_1(z) = \frac{\sin(z)}{z}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z} = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots

for z \in \mathbb{C} - \{0\}. This has no non-zero coefficients in its singular part (i.e., it only has an analytic part) so the singularity is a removable one.

The function

f_2(z) = \frac{\sin(z)}{z^3}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z^3} = \frac{1}{z^2} - \frac{1}{3!} + \frac{z^2}{5!} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n = 0 for all n < -2 and a_{-2} \neq 0, so the singularity in this case is a pole of order 2.

Finally, the function

f_3(z) = \sin\big( \frac{1}{z} \big)

has a singularity at 0 and its Laurent series expansion about 0 is

\sin \big(\frac{1}{z} \big) = \frac{1}{z} - \frac{1}{3! z^3} + \frac{1}{5! z^5} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n \neq 0 for infinitely many n < 0 so the singularity here is an essential singularity.

Singularities in Schwarzschild black holes

One often hears about singularities in the context of black hole physics and I wanted to quickly look at singularities in the particular case of non-rotating black holes. A detailed investigation of the various singularities that appear in exact solutions of Einstein’s field equations was conducted in the 1960s and 1970s by Penrose, Hawking, Geroch and others. See, e.g., this paper by Penrose and Hawking. There is now a vast literature on this topic. The following discussion is just my own quick look at how the ideas might arise.

The spacetime of a non-rotating spherical black hole is usually analysed using the Schwarzschild solution of the Einstein field equations for an isolated spherical mass m. In spherical coordinates this is the metric

\Delta \tau = \bigg[ \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2} \bigg\{\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)} + r^2(\Delta \theta)^2 + r^2 \sin^2 \theta (\Delta \phi)^2\bigg\} \bigg]^{1/2}

where

k = \frac{2mG}{c^2} and m is the mass of the spherically symmetric static object exterior to which the Schwarzschild metric applies. If we consider only radial motion (i.e., world lines for which \Delta \theta = \Delta \phi = 0) the Schwarzschild metric simplifies to

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2}\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)}

We can see that the \Delta r term in the metric becomes infinite at r = k so there is apparently a singularity here. However, this singularity is `removable’ by re-expressing the metric in a new set of coordinates, r and t^{\prime}, known as the Eddington-Finkelstein coordinates. The transformed metric has the form

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t^{\prime})^2 - \frac{2k \Delta t^{\prime} \Delta r}{cr} - \frac{(\Delta r)^2}{c^2}\big(1 + \frac{k}{r}\big)

which does not behave badly at r = k. In general relativity, this type of removable singularity is known as a coordinate singularity. Another example is the apparent singularity at the 90^{\circ} latitude in spherical coordinates, which disappears when a different coordinate system is used.

Since the term \big(1 - \frac{k}{r} \big) in the Schwarzschild metric becomes infinite at r = 0, it appears that we also have a singularity at this point. This is not a removable singularity and can in fact be recognised in terms of the earlier discussion above as a pole of order 1 (also called a simple pole).

 

Different possible branch cuts for the principal argument, principal logarithm and principal square root functions

BranchCutFor some work I was doing with a student, I was trying to find different ways of proving the familiar result that the complex square root function f(z) = \sqrt{z} is discontinuous everywhere on the negative real axis. As I was working on alternative proofs it became very clear to me how `sensitive’ all the proofs were to the particular definition of the principal argument I was using, namely that the principal argument \theta = \text{Arg}z is the unique argument of z satisfying -\pi < \theta \leq \pi. In a sense, this definition `manufactures’ the discontinuity of the complex square root function on the negative real axis, because the principal argument function itself is discontinuous here: the principal argument of a sequence of points approaching the negative real axis from above will tend to \pi, whereas the principal argument of a sequence approaching the same point on the negative real axis from below will tend to -\pi. I realised that all the proofs I was coming up with were exploiting this discontinuity of the principal argument function. However, this particular choice of principal argument function is completely arbitrary. An alternative could be to say that the principal argument of z is the unique argument satisfying 0 \leq \theta < 2\pi which we can call \text{Arg}_{2\pi} z. The effect of this choice of principal argument function is to make the complex square root function discontinuous everywhere on the positive real axis! It turns out that we can choose an infinite number of different lines to be lines of discontinuity for the complex square root function, simply by choosing different definitions of the principal argument function. The same applies to the complex logarithm function. In this note I want to record some of my thoughts about this.

The reason for having to specify principal argument functions in the first place is that we need to make complex functions of complex variables single-valued rather than multiple-valued, to make them well-behaved with regard to operations like differentiation. Specifying a principal argument function in order to make a particular complex function single-valued is called choosing a branch of the function. If we specify the principal argument function to be f(z) = \text{Arg} z where -\pi < \text{Arg} z \leq \pi then we define the principal branch of the logarithm function to be

\text{Log} z = \text{log}_e |z| + i \text{Arg} z

for z \in \mathbb{C} - \{0\}, and the principal branch of the square root function to be

z^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z \big)

for z \in \mathbb{C} with z \neq 0.

If we define the functions \text{Log} z and z^{\frac{1}{2}} in this way they will be single-valued, but the cost of doing this is that they will not be continuous on the whole of the complex plane (essentially because of the discontinuity of the principal argument function, which both functions `inherit’). They will be discontinuous everywhere on the negative real axis. The negative real axis is known as a branch cut for these functions. Using this terminology, what I want to explore in this short note is the fact that different choices of branch for these functions will result in different branch cuts for them.

To begin with, let’s formally prove the discontinuity of the principal argument function f(z) = \text{Arg} z, z \neq 0, and then see how this discontinuity is `inherited’ by the principal logarithm and square root functions. For the purposes of the proof we can consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x < 0 \}

Clearly, as n \rightarrow \infty, we have z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = \text{Arg} \big( |\alpha| \text{e}^{(-\pi + 1/n)i}\big)

= -\pi + \frac{1}{n}

\rightarrow -\pi

whereas

f(\alpha) = \text{Arg}\big(|\alpha| \text{e}^{\pi i} \big) = \pi

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the negative real axis.

Now consider how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the negative real axis depends crucially on the discontinuity of \text{Arg} z. We again consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x < 0 \}

so that z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (- \pi + \frac{1}{n}) \big)

\rightarrow |\alpha|^{\frac{1}{2}} \text{e}^{-i \pi /2} = - i |\alpha|^{\frac{1}{2}}

whereas

f(\alpha) = \big( |\alpha| \text{e}^{i \pi}\big)^{\frac{1}{2}}

= |\alpha|^{\frac{1}{2}} \text{e}^{i \pi/2} = i |\alpha|^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the negative real axis.

Now suppose we choose a different branch for the principal logarithm and square root functions, say \text{Arg}_{2\pi} z which as we said earlier satisfies 0 \leq \text{Arg}_{2\pi} z < 2\pi. The effect of this is to change the branch cut of these functions to the positive real axis! The reason is that the principal argument function will now be discontinuous everywhere on the positive real axis, and this discontinuity will again be `inherited’ by the principal logarithm and square root functions.

To prove the discontinuity of the principal argument function f(z) = \text{Arg}_{2\pi} z on the positive real axis we can consider the sequence of points

z_n = \alpha \text{e}^{(2 \pi - 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x > 0 \}

We have z_n \rightarrow \alpha. However,

f(z_n) = \text{Arg}_{2\pi} \big(\alpha \text{e}^{(2\pi - 1/n)i}\big)

= 2\pi - \frac{1}{n}

\rightarrow 2\pi

whereas

f(\alpha) = \text{Arg}_{2\pi}(\alpha) = 0

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the positive real axis.

We can now again see how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the positive real axis depends crucially on the discontinuity of \text{Arg}_{2\pi} z there. We again consider the sequence of points

z_n = \alpha \text{e}^{(2\pi - 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x > 0 \}

so that z_n \rightarrow \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg}_{2\pi} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (2 \pi - \frac{1}{n}) \big)

\rightarrow \alpha^{\frac{1}{2}} \text{e}^{i 2 \pi /2} = - \alpha^{\frac{1}{2}}

whereas

f(\alpha) = \alpha^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the positive real axis.

There are infinitely many other branches to choose from. In general, if \tau is any real number, we can define the principal argument function to be f(z) = \text{Arg}_{\tau} z where

\tau \leq \text{Arg}_{\tau} < \tau + 2\pi

and this will give rise to a branch cut for the principal logarithm and square root functions consisting of a line emanating from the origin and containing all those points z such that \text{arg}(z) = \tau modulo 2\pi.

Advanced Number Theory Note #16: A proof of the law of quadratic reciprocity using Gauss sums and quadratic characters

Gauss The prince of mathematicians (princeps mathematicorum), Carl Friedrich Gauss, arguably the greatest mathematician who ever lived, devoted a lot of attention to exploring alternative proofs of the law of quadratic reciprocity. As I mentioned in a previous note, this is actually a very deep result which has had a profound impact on modern mathematics. A rather good Wikipedia page about the quadratic reciprocity law has a section entitled connection with cyclotomy which makes clear its importance to the development of modern class field theory, and a history and alternative statements section catalogues its somewhat convoluted history.

In the present note I want to explore in detail one of the (many) approaches to proving the law of quadratic reciprocity, an approach which uses Gauss sums and Legendre symbols. (In a later note I will explore another proof using quadratic Gauss sums which involves contour integration techniques from complex analysis).

The proof consists of three key theorems, as follows:

Theorem I. This proves that G(1, \chi)^2 = (-1|p)p when \chi = (r|p).

Theorem II. This proves that G(1, \chi)^{q-1} \equiv (q|p) (mod q) is equivalent to the law of quadratic reciprocity, using Theorem I.

Theorem III. This proves an identity for G(1, \chi)^{q-1} from which the congruence in Theorem II follows, thus completing the overall proof of the quadratic reciprocity law.

In a previous note I stated a version of the law of quadratic reciprocity due to Legendre as follows: if p and q are distinct odd primes then

(p|q) = \begin{cases} (q|p),& \text{if either } p \equiv 1 \ \text{(mod 4) } \text{or } q \equiv 1 \ \text{(mod 4)}\\ -(q|p), & \text{if } p \equiv q \equiv 3 \ \text{(mod 4)} \end{cases}

For the purposes of the proof in the present note it is necessary to express the quadratic reciprocity law as

(q|p) = (-1)^{(p - 1)(q - 1)/4}(p|q)

These two formulations are completely equivalent. To see this, note that if p \equiv 1 (mod 4) or p \equiv 1 (mod 4), the exponent on (-1) in the second formulation reduces to an even integer so we get (q|p) = (p|q). On the other hand, if p \equiv q \equiv 3 (mod 4), the exponent on (-1) in the second formulation reduces to an odd integer, so we get (q|p) = -(p|q).

Also note that the proof makes use of Gauss sums incorporating the Legendre symbol as the Dirichlet character in the summand, i.e., Gauss sums of the form

G(n, \chi) = \sum_{r \text{mod } p} \chi(r) e^{2 \pi i n r/p}

where \chi(r) = (r|p), and the Legendre symbol in this context is referred to as the quadratic character mod p. Since the modulus is prime, the Dirichlet character here is primitive and we have that G(n, \chi) is separable with

G(n, \chi) = \overline{\chi(n)} G(1, \chi) = (n|p) G(1, \chi)

for every n, because either gcd(n, p) = 1, or if gcd(n, p) > 1 we must have p|n in which case G(n, \chi) = 0 because e^{2 \pi i n r/p} = 1 and \sum_{r \text{mod } p} \chi(r) = 0 (for non-principal characters the rows of the character tables sum to zero).

Theorem I. If p is an odd prime and \chi(r) = (r|p) then

G(1, \chi)^2 = (-1|p) p

Proof: We have

G(1, \chi) = \sum_{r = 1}^{p - 1} (r|p) e^{2 \pi i r/p}

and therefore

G(1, \chi)^2 = \sum_{r = 1}^{p - 1} (r|p) e^{2 \pi i r/p} \times \sum_{s = 1}^{p - 1} (s|p) e^{2 \pi i s/p}

= \sum_{r = 1}^{p - 1} \sum_{s = 1}^{p - 1} (r|p) (s|p) e^{2 \pi i (r + s)/p}

For each pair of values of r and s there is a unique t mod p such that

s \equiv tr (mod p)

since this is a linear congruence with a unique solution. We also have that

(r|p) (s|p) = (r|p) (tr|p)

= (r|p) (r|p) (t|p)

= (r^2|p) (t|p)

= (t|p)

Therefore we can write

G(1, \chi)^2 = \sum_{r = 1}^{p - 1} \sum_{tr = 1}^{p - 1} (t|p) e^{2 \pi i r (1 + t)/p}

= \sum_{r = 1}^{p - 1} \sum_{t = 1}^{p - 1} (t|p) e^{2 \pi i r (1 + t)/p}

(where the index in the second summation has been reduced to t since t will range through all the least positive residues of p independently of r)

= \sum_{t = 1}^{p - 1} (t|p) \sum_{r = 1}^{p - 1} e^{2 \pi i r (1 + t)/p}

The last sum on r is a geometric sum of the form

g(1 + t) = \sum_{r = 1}^{p - 1} x^r

where

x = e^{2 \pi i (1 + t)/p}

so we have

g(1 + t) = \begin{cases} \frac{x^p - x}{x - 1} & \text{if } x \neq 1\\ p - 1 & \text{if } x = 1 \end{cases}

But it must be the case that x^p = 1 (because x is a pth root of unity), and we also have that x = 1 if and only if p|(1 + t), so we can write

g(1 + t) = \begin{cases} -1 & \text{if } p \nmid (1 + t) \\ p - 1 & \text{if } p | (1 + t) \end{cases}

Therefore

G(1, \chi)^2 = - \sum_{t = 1}^{p - 2} (t|p) + (p - 1) (p - 1|p)

(because the only value of t for which p|(1 + t) is t = p - 1, so we can pull this out of the summation and then for this value of t the Legendre symbol (t|p) becomes (p - 1|p))

= - \sum_{t = 1}^{p - 2} (t|p) - (p - 1|p) + p(p - 1|p)

= - \sum_{t = 1}^{p - 1} (t|p) + p(p - 1|p)

= - \sum_{t = 1}^{p - 1} (t|p) + p(-1|p)

(since (p - 1|p) = (-1|p))

= (-1|p) p

(because \sum_{t = 1}^{p - 1} (t|p) = 0 since (t|p) is a Dirichlet character mod p and the rows of Dirichlet character tables sum to zero). \square

Since (-1|p) = \pm 1, Theorem I tells us that G(1, \chi)^2 is an integer and it then follows that

G(1, \chi)^{q - 1} = (G(1, \chi)^2)^{(q - 1)/2}

is also an integer for every odd q. It turns out that the law of quadratic reciprocity is intimately related to the value of the integer G(1, \chi)^{q - 1} mod q, which is what the next theorem shows.

Theorem II. Let p and q be distinct odd primes and let \chi be the quadratic character (i.e., the Legendre symbol) mod p. Then the quadratic reciprocity law

(q|p) = (-1)^{(p - 1)(q - 1)/4}(p|q)

is equivalent to the congruence

G(1, \chi)^{q - 1} \equiv (q|p) (mod q)

Proof: From the result proved in Theorem I we have that

G(1, \chi)^{q - 1} = (G(1, \chi)^2)^{(q - 1)/2}

= (-1|p)^{(q - 1)/2} p^{(q - 1)/2}

= (-1)^{(p - 1)(q - 1)/4} p^{(q - 1)/2}

where the last equality follows from property (e) of Legendre symbols in my previous note which implies

(-1|p) = (-1)^{(p - 1)/2}

By property (d) of Legendre symbols we also have

p^{(q - 1)/2} \equiv (p|q) (mod q)

so we can write

G(1, \chi)^{q - 1} \equiv (-1)^{(p - 1)(q - 1)/4} (p|q) = (q|p) (mod q)

where the last equality follows from the law of quadratic reciprocity.

Therefore if the law of quadratic reciprocity holds, then so does

G(1, \chi)^{q - 1} \equiv (q|p) (mod q)

and vice versa. \square

The last stage of the proof is now to deduce the congruence in Theorem II from an identity for G(1, \chi)^{q - 1} which is established in the next theorem.

Theorem III. If p and q are distinct odd primes and if \chi is the quadratic character (i.e., Legendre symbol) mod p, we have

G(1, \chi)^{q - 1} = (q|p) \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv q (mod p)

Proof: It is easy to show that the Gauss sum G(n, \chi) is a periodic function of n with period p since

G(n + p, \chi) = \sum_{m = 1}^{p} \chi(m) e^{2 \pi i m (n + p)/p}

= \sum_{m = 1}^{p} \chi(m) e^{2 \pi i m n/p} e^{2 \pi i m}

= \sum_{m = 1}^{p} \chi(m) e^{2 \pi i m n/p} = G(n, \chi)

Since therefore

G(n, \chi)^q = G(n + p, \chi)^q

it follows that G(n, \chi)^q is also a periodic function of n with period p. Therefore we have a finite Fourier expansion

G(n, \chi)^q = \sum_{m \ mod \ p} a_q(m) e^{2 \pi i m n/p}

where the coefficients are given by

a_q(m) = \frac{1}{p} \sum_{n \ mod \ p} G(n, \chi)^q e^{-2 \pi i m n/p}

(see my previous note on finding the finite Fourier expansion of an arithmetical function). Simply from the general definition of G(n, \chi) (using Legendre symbols as Dirichlet characters) we have

G(n, \chi)^q = \sum_{r_1 mod \ p} (r_1|p) e^{2 \pi i n r_1/p} \cdots \sum_{r_q mod \ p} (r_q|p) e^{2 \pi i n r_q/p}

= \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p) e^{2 \pi i n (r_1 + \cdots + r_q)/p}

so we can write the above Fourier expansion coefficients as

a_q(m) = \frac{1}{p} \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p) \times \sum_{n \ mod \ p} e^{2 \pi i n (r_1 + \cdots + r_q - m)/p}

The sum on n is a geometric sum of the form

g(r_1 + \cdots + r_q - m) = \sum_{n = 0}^{p - 1} x^n

where

x = e^{2 \pi i (r_1 + \cdots + r_q - m)/p}

so we have

g(r_1 + \cdots + r_q - m) = \begin{cases} \frac{x^p - 1}{x - 1} & \text{if } x \neq 1\\ p & \text{if } x = 1 \end{cases}

= \begin{cases} 0 & \text{if } x \neq 1\\ p & \text{if } x = 1 \end{cases}

(since x^p = 1 because x is a pth root of unity). Therefore in the expression for a_q(m) the sum on n vanishes unless r_1 + \cdots + r_q \equiv m (mod p), in which case the sum is equal to p. Therefore we can write

a_q(m) = \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv m (mod p)

Now we return to the original expression for a_q(m), namely

a_q(m) = \frac{1}{p} \sum_{n \ mod \ p} G(n, \chi)^q e^{-2 \pi i m n/p}

and use this to obtain a different expression for a_q(m). The separability of G(n, \chi) means that

G(n, \chi) = (n|p) G(1, \chi)

We also have the result for odd q that

(n|p)^q = (n^q|p) = (n^{q-1}|p) (n|p) = (n|p)

(since q - 1 is even). Therefore we find

a_q(m) = \frac{1}{p} G(1, \chi)^q \sum_{n \ mod \ p} (n|p) e^{-2 \pi i m n/p}

= \frac{1}{p} G(1, \chi)^q G(-m, \chi)

= \frac{1}{p} G(1, \chi)^q (m|p) G(-1, \chi)

= (m|p) G(1, \chi)^{q - 1}

where the last equality follows from the fact that

G(1, \chi) G(-1, \chi) = G(1, \chi) (-1|p) G(1, \chi)

= G(1, \chi)^2 (-1|p)

= (-1|p) p (-1|p)

= ((-1)^2|p) p

= p

Therefore

(m|p) a_q(m) = (m|p) (m|p) G(1, \chi)^{q - 1}

= (m^2|p) G(1, \chi)^{q - 1}

= G(1, \chi)^{q - 1}

Taking m = q and the previously obtained expression for a_q(m) we get the claimed result

G(1, \chi)^{q - 1} = (q|p) \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv q (mod p). \square

We are now in a position to deduce the law of quadratic reciprocity from Theorems I, II and III. From the result obtained in Theorem II, it suffices to show that

\sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p) \equiv 1 (mod q)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv q (mod p)

i.e., every term (r_1 \cdots r_q |p) in the summand satisfies this restriction. One way in which this restriction is satisfied is when

r_j \equiv 1 (mod p)

for j = 1, \dots, q. In this case we have

(r_1 \cdots r_q |p) = (1|p) = 1

Every other possible way of satisfying the restriction involves

r_j \not \equiv r_k (mod p)

for some j \neq k. For each of these ways, every cyclic permutation of r_1, \ldots, r_q satisfying the restriction contributes the same summand (r_1 \cdots r_q |p). Therefore for each of these ways of satisfying the restriction, each summand appears q times and therefore contributes 0 modulo q to the sum. Therefore only the scenario r_j \equiv 1 (mod p) for j = 1, \dots, q yields a non-zero contribution, so the sum is 1 (mod q). This completes the proof of the law of quadratic reciprocity.

To clarify the last point, consider the following

Example: Take p = 5 and q = 3. Then the equations are

\sum_{r_1 \ mod \ 5} \sum_{r_2 \ mod \ 5} \sum_{r_3 \ mod \ 5} (r_1 r_2 r_3|5) \equiv 1 (mod 3)

where the summation indices satisfy the restriction

r_1 + r_2 + r_3 \equiv 3 (mod 5)

In the case when r_1 \equiv r_2 \equiv r_3 \equiv 1 (mod 5) we have

(r_1 r_2 r_3|5) \equiv (1|5) = 1

so the first equation is satisfied.

Suppose we consider any other way of satisfying the restriction, say

r_1 \equiv 1, r_2 \equiv 3, r_3 \equiv 4 (mod 5)

so that

r_1 + r_2 + r_3 \equiv 8 \equiv 3 (mod 5)

Then the cyclic permutations

r_1 \equiv 4, r_2 \equiv 1, r_3 \equiv 3 (mod 5)

and

r_1 \equiv 3, r_2 \equiv 4, r_3 \equiv 1 (mod 5)

will also satisfy the restriction, and these contribute a total of

3 (1 \cdot 3 \cdot 4|5) \equiv 0 (mod 3)

to the sum. Therefore only the first way of satisfying the restriction will contribute to the sum, so the sum must equal 1.

Advanced Number Theory Note #15: The Legendre symbol (a|p) as a Dirichlet character mod p

Legendre The Legendre symbol was introduced by the great 19th Century mathematician Adrien-Marie Legendre (the charicature shown here is the only known contemporary likeness of him). It has proved to be very useful as a shorthand for stating a number’s quadratic character and also in calculations thereof.

If p is an odd prime, then the Legendre symbol (a|p) = 1 if a is a quadratic residue of p, (a|p) = -1 if a is a quadratic non-residue of p, and (a|p) = 0 if a \equiv 0 (mod p).

The Legendre symbol has a number of well known properties which are useful for calculations and are summarised here for convenience:

LegendreProperties

QuadraticCharacterOf2

QuadraticReciprocity

The last property, the law of quadratic reciprocity, is actually a deep result which has been studied in depth and proved in numerous different ways by Gauss and others. Indeed, it was for the purpose of finding his own proof of this result that Legendre invented the Legendre symbol. In a later note I will explore in detail a proof of the law of quadratic reciprocity using Gauss sums and Legendre symbols. This proof hinges on the fact that the Legendre symbol (a|p) is a Dirichlet character mod p. In the present short note I want to quickly show explicitly why this is the case by highlighting three key facts about Legendre symbols:

I. The Legendre symbol (a|p) is a completely multiplicative function of a.

II. The Legendre symbol (a|p) is periodic with period p.

III. The Legendre symbol vanishes when p|a.

Fact III follows immediately from the definition of Legendre symbols, and II is true because we have

a \equiv a +p (mod p)

and therefore (by property (a) above)

(a|p) = (a + p|p)

so the Legendre symbol is periodic with period p.

To prove I, observe that if p|a or p|b then ab \equiv 0 (mod p) so

(ab|p) = (a|p) \cdot (b|p) = 0

since at least one of (a|p) or (b|p) must be zero.

If p \not| a and p \not| b, then p \not| ab and we have (by property (d) above)

(ab|p) \equiv (ab)^{(p-1)/2}

\equiv (a)^{(p-1)/2} \cdot (b)^{(p-1)/2}

\equiv (a|p) \cdot (b|p) (mod p)

Therefore

(ab|p) - (a|p) \cdot (b|p)

is divisible by p, and since this difference cannot actually equal a multiple of p (the terms can only take the values 1 or -1), the difference must be zero. The Legendre symbol is therefore completely multiplicative as claimed in I.

Since (a|p) is a completely multiplicative function of a which is periodic with period p and vanishes when p|a, it follows that (a|p) is a Dirichlet character \chi(a) mod p as claimed.

I will illustrate this with two examples. First, let p = 7. We have

1^2 \equiv 1 (mod 7)

2^2 \equiv 4 (mod 7)

3^2 \equiv 2 (mod 7)

so the quadratic residues of 7 are 1, 2 and 4 and the quadratic non-residues are 3, 5 and 6. The Legendre symbol (a|7) therefore takes the values

(1|7) = 1

(2|7) = 1

(3|7) = -1

(4|7) = 1

(5|7) = -1

(6|7) = -1

These are exactly the values of the fourth character in the Dirichlet character table mod 7:

mod 07

Thus, (a|7) = \chi_4(a) mod 7.

For a second example, let p = 11. We have

1^2 \equiv 1 (mod 11)

2^2 \equiv 4 (mod 11)

3^2 \equiv 9 (mod 11)

4^2 \equiv 5 (mod 11)

5^2 \equiv 3 (mod 11)

so the quadratic residues of 11 are 1, 3 and 4, 5 and 9 and the quadratic non-residues are 2, 6 and 7, 8, and 10. The Legendre symbol (a|11) therefore takes the values

(1|11) = 1

(2|11) = -1

(3|11) = 1

(4|11) = 1

(5|11) = 1

(6|11) = -1

(7|11) = -1

(8|11) = -1

(9|11) = 1

(10|11) = -1

These are exactly the values of the sixth character in the Dirichlet character table mod 11:

mod 11

Thus, (a|11) = \chi_6(a) mod 11.

A note on the quaternion rotation operator

Plaque Sir William Rowan Hamilton famously discovered the key rules for quaternion algebra while walking with his wife past a bridge in Dublin in 1843. A plaque (shown left) now commemorates this event.

I needed to use the quaternion rotation operator recently and while digging around the literature on this topic I noticed that a lot of it is quite unclear and over-complicated. See, e.g., this Wikipedia article about it and references therein. A couple of simple, yet vital, ideas, if they were spelt out, would make key results seem less mysterious but these never seem to be mentioned.  The vector notation often used in this area also seems to over-complicate things. In this note I want to record some thoughts about the quaternion rotation operator, bringing out some key underlying ideas that (to me) make things seem far less mysterious.

Quaternions are hypercomplex numbers of the form

q = a + bi + cj + dk

In many ways (as I will show below) they can usefully be thought about using familar ideas for two-dimensional complex numbers x + yi in which i \equiv \sqrt{-1}.

cyclic In the case of quaternions, the identities i^2 = j^2 = k^2 = ijk = -1 discovered by Hamilton determine all possible products of i, j and k. They imply a cyclic relationship when calculating their products (similar to that of the cross products of the three-dimensional basis vectors i, j, k, which is why authors often use these basis vectors when defining quaternions). Taking products clockwise one obtains

ij = k
jk = i
ki = j

and taking products anticlockwise one obtains

ik = -j
kj = -i
ji = -k

This algebraic structure leads to some interesting differences from the familiar algebra of ordinary two-dimensional complex numbers, particularly non-commutativity of multiplication. Technically, one says that the set of all quaternions with the operations of addition and multiplication constitute a non-commutative division ring, i.e., every non-zero quaternion has an inverse and quaternion products are generally non-commutative.

I was interested to notice that one way in which this algebraic difference with ordinary complex numbers manifests itself is in taking the complex conjugate of products. With ordinary complex numbers z_1 = x_1 + y_1 i and z_2 = x_2 + y_2 i one obtains

\overline{z_1 \cdot z_2} = \overline{z_1} \cdot \overline{z_2}

since

z_1 \cdot z_2 = x_1 x_2 + (x_1 y_2 + x_2 y_1)i - y_1 y_2 = (x_1 x_2 - y_1 y_2) + (x_1 y_2 + x_2 y_1)i

and therefore

\overline{z_1 \cdot z_2} = (x_1 x_2 - y_1 y_2) - (x_1 y_2 + x_2 y_1)i

but this is the same as

\overline{z_1} \cdot \overline{z_2} = (x_1 - y_1 i)(x_2 - y_2 i) = x_1 x_2 - (x_1 y_2 + x_2 y_1)i - y_1 y_2

= (x_1 x_2 - y_1 y_2) - (x_1 y_2 + x_2 y_1)i

With the product of two quaternions q_1 and q_2 we get a different result:

\overline{q_1 \cdot q_2} = \overline{q_2} \cdot \overline{q_1}

In words, the complex conjugate of the product is the product of the complex conjugates in reverse order. To see this, let

q_1 = a_1 + b_1i + c_1j + d_1k

q_2 = a_2 + b_2i + c_2j + d_2k

Then

q_1 \cdot q_2 =

a_1a_2 + a_1b_2i + a_1c_2j + a_1d_2k

+ a_2b_1i - b_1b_2 + b_1c_2k - b_1d_2j

+ a_2c_1j - b_2c_1k - c_1c_2 + c_1d_2i

+ a_2d_1k + b_2d_1j - c_2d_1i - d_1d_2

=

(a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2)

+ (a_1b_2 + a_2b_1 + c_1d_2 - c_2d_1)i

+ (a_1c_2 - b_1d_2 + a_2c_1 + b_2d_1)j

+ (a_1d_2 + b_1c_2 - b_2c_1 + a_2d_1)k

Therefore

\overline{q_1 \cdot q_2} =

(a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2)

+ (c_2d_1 - a_1b_2 - a_2b_1 - c_1d_2)i

+ (b_1d_2 - a_1c_2 - a_2c_1 - b_2d_1)j

+ (b_2c_1 - a_1d_2 - b_1c_2 - a_2d_1)k

But this is the same as

\overline{q_2} \cdot \overline{q_1}

= (a_2 - b_2i - c_2j - d_2k)(a_1 - b_1i - c_1j - d_1k)

= a_1a_2 - a_2b_1i - a_2c_1j - a_2d_1k

- a_1b_2i - b_1b_2 + b_2c_1k - b_2d_1j

- a_1c_2j - b_1c_2k - c_1c_2 + c_2d_1i

- a_1d_2k + b_1d_2j - c_1d_2i - d_1d_2

=

(a_1a_2 - b_1b_2 - c_1c_2 - d_1d_2)

+ (c_2d_1 - a_1b_2 - a_2b_1 - c_1d_2)i

+ (b_1d_2 - a_1c_2 - a_2c_1 - b_2d_1)j

+ (b_2c_1 - a_1d_2 - b_1c_2 - a_2d_1)k

The complex conjugate is used to define the length |q| of a quaternion

q = a + bi + cj + dk

as

|q| = \sqrt{\overline{q} \cdot q}

= \sqrt{(a - bi - cj - dk)(a + bi + cj + dk)}

= \sqrt{a^2 + b^2 + c^2 + d^2}

To find the inverse q^{-1} of a quaternion we observe that

q \cdot q^{-1} = 1

so

\overline{q} \cdot q \cdot q^{-1} = \overline{q}

\iff |q|^2 \cdot q^{-1} = \overline{q}

\iff q^{-1} = \frac{\overline{q}}{|q|^2}

For a quaternion

q = a + bi + cj + dk

let

r \equiv \sqrt{b^2 + c^2 + d^2}

A key result that helps to clarify the literature on quaternion rotation is that

\frac{bi + cj + dk}{r} = \sqrt{-1}

This seems mysterious at first but can easily be verified by confirming that when the term on the left hand side is multiplied by itself, the result is -1:

\frac{1}{r^2}(bi + cj + dk)(bi + cj + dk)

= \frac{1}{r^2}(-b^2 + bck - bdj -cbk - c^2 + cdi + dbj - dci - d^2)

= \frac{-b^2 - c^2 - d^2}{r^2}

= -\frac{r^2}{r^2} = -1

This result means that any quaternion of the above form can be written as

q = a + r \big( \frac{bi + cj + dk}{r} \big)

= a + r \sqrt{-1}

This is just a familiar two-dimensional complex number! It therefore has an angle \theta associated with it, given by the equations

quaternionargand

\cos \theta = \frac{a}{|q|}

\sin \theta = \frac{r}{|q|}

\tan \theta = \frac{r}{a}

We can express the quaternion in terms of this angle as

q = |q|(\cos \theta + \frac{bi + cj + dk}{r} \sin \theta)

If q is a unit quaternion (i.e., |q| = 1) we then get that

q = \cos \theta + \frac{bi + cj + dk}{r} \sin \theta

This is the form that is needed in the context of quaternion rotation. It turns out that for any unit quaternion of this form and for any vector (v_1, v_2, v_3) \in \mathbb{R} the operation

L_q(v_1, v_2, v_3) = q \cdot (v_1, v_2, v_3) \cdot \overline{q}

will result in a rotation of the vector (v_1, v_2, v_3) through an angle 2 \theta about the vector (b, c, d) as the axis of rotation. The direction of rotation is given by the familiar right-hand rule, i.e., the thumb of the right hand points in the direction of the vector (b, c, d) and the fingers then curl in the direction of rotation.

As an example, suppose we want to rotate the vector (0, 0, 1) through 90^{\circ} about the vector (0, 1, 0) in the sense of the right-hand rule.

unitvectors

Looking at the diagram above, we would expect the result to be the vector (1, 0, 0). Using the quaternion rotation operator to achieve this, we would specify

2 \theta = 90^{\circ} \implies \theta = 45^{\circ}

\cos \theta = \frac{1}{\sqrt{2}}

\sin \theta = \frac{1}{\sqrt{2}}

\frac{bi + cj + dk}{r} = \frac{0i + 1j + 0k}{1} = j

q = \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} j

\overline{q} = \frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}} j

v_1i + v_2j + v_3k = 0i + 0j + 1k = k

The resulting vector would be

L_q(v_1, v_2, v_3) = q \cdot (v_1i, v_2j, v_3k) \cdot \overline{q}

= \big(\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}}j\big)(k)\big(\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}j\big)

= \big(\frac{1}{\sqrt{2}}k + \frac{1}{\sqrt{2}}i\big)\big(\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}j\big)

= \frac{1}{2}k + \frac{1}{2}i + \frac{1}{2}i - \frac{1}{2}k

= i

= 1i + 0j + 0k

This result is interpreted as the vector (1, 0, 0) which is exactly what we expected based on the diagram above.

Note that to achieve the same result through conventional matrix algebra we would have to use the unwieldy rotation matrix

\begin{bmatrix} \cos \varphi & 0 & \sin \varphi\\0 & 1 & 0\\-\sin \varphi & 0 & \cos \varphi\end{bmatrix}

Setting \varphi = 90^{\circ} and applying this to the vector (0, 0, 1)^T we get

\begin{bmatrix} 0 & 0 & 1\\0 & 1 & 0\\-1 & 0 & 0\end{bmatrix} \begin{bmatrix}0\\0\\1\end{bmatrix}

= \begin{bmatrix}1\\0\\0\end{bmatrix}

This is the same result, but note that this approach is more complicated to implement because there are different rotation matrices for different axes of rotation and for different rotation conventions. The above rotation matrix happens to be the one required to achieve a rotation about the y-axis using the right-hand rule. In general, the correct rotation matrix would have to be computed each time to suit the particular rotation required. Once the angle of rotation and the axis of rotation are known, the quaternion rotation operator is much easier to specify and implement.