On discrete and continuous forms of Slutsky’s equation

slutskyI have noticed that there is a sharp ‘jump’ in the literature concerning Slutsky’s decomposition equation for the effects of a price change on demand, from elementary discussions of a discrete version for relatively large price changes (which can be illustrated in a diagram), to the full blown partial differential equation form for infinitesimal price changes expressed in terms of a Hicksian demand function (in more advanced texts). I have not been able to find any appealing discussions of how the set-up that leads to the discrete form relates to the partial differential equation form as we pass to the limit of infinitesimal price changes. In this note I want to explore the link between these two versions.

Slutsky’s decomposition equation expresses the effects of a price change on Marshallian demand in terms of a pure substitution effect, which is always negative, and an income effect, which can be positive or negative depending on whether the good is a normal good or an inferior good respectively. I will employ a simple two-good numerical example to illustrate the discrete form of Slutsky’s equation, using a Cobb-Douglas utility function of the form

u(x, y) = x^{1/2}y^{1/2}

The mathematical problem is to find the combination of x and y which maximises this utility function subject to the budget constraint

p_1x + p_2y = m

The Cobb-Douglas utility function is globally concave and smooth so we are guaranteed to find a unique interior solution by partial differentiation. One normally proceeds by taking the natural logarithm of the utility function (this is a monotonic transformation so does not affect the preferences represented by the original utility function) and setting up the Lagrangian for the problem, namely

L = \frac{1}{2}\ln x + \frac{1}{2}\ln y + \lambda (m - p_1x -p_2y)

Taking first-order partial derivatives with respect to x, y and \lambda and setting them equal to zero we get

\frac{\partial L}{\partial x}  = \frac{1}{2x} - \lambda p_1 = 0

\frac{\partial L}{\partial y} = \frac{1}{2y} - \lambda p_2 = 0

\frac{\partial L}{\partial \lambda} = m - p_1x - p_2y = 0

This is a system of three equations in three unknowns. Dividing the first equation by the second and rearranging one obtains

\frac{y}{x} = \frac{p_1}{p_2}

Solving the third equation for x we get

x = \frac{m}{p_1} - \frac{p_2}{p_1}y

and substituting this into the equation above we get

\frac{y}{\frac{m}{p_1} - \frac{p_2}{p_1}y} = \frac{p_1}{p_2}

\iff

y = \frac{m}{2p_2}

This is the uncompensated demand function (often also called the Marshallian demand function) for good y. By symmetry, the uncompensated demand function for good x is

x = \frac{m}{2p_1}

(Note that rearranging the demand function for y we get p_2y = \frac{m}{2} which says that the consumer will spend exactly half of the income on y, and similarly for x. Whenever the Cobb-Douglas utility function is in a form in which the exponents on the goods are fractions which sum to 1, these fractions tell us the proportions of income which will be spent on the corresponding goods. Our utility function was of the form u(x, y) = x^{1/2}y^{1/2} so one-half of the total income is spent on each good, as we confirmed with the above calculation).

To illustrate Slutsky’s decomposition of the effects of a price change into a pure substitution effect and an income effect, consider the above uncompensated demand function for x and suppose that m = \pounds 1000 while p_1 = \pounds 10. The amount of x demanded at this income and price is then

x(p_1, m) = \frac{1000}{(2)(10)}= 50

This corresponds to the amount of x in the bundle A in the diagram below.

 

slutskypricerise

 

Now suppose that the price rises to p_1^{*} = \pounds 20

The amount of x demanded at the original income and this new price falls to

x(p_1^{*}, m) = \frac{1000}{(2)(20)} = 25

This corresponds to the amount of x in the bundle C in the diagram.

Slutsky’s decomposition of this total change in demand begins by asking what change in income would be enough to enable the consumer to buy the original amount of x at the new price. This amount of additional income is obtained as

p_1^{*}x - p_1x = (20)(50) - (10)(50) = \pounds 500

Therefore, `compensating’ the consumer by increasing the income level from m = \pounds 1000 to m^{*} = \pounds 1500 enables them to buy their original bundle A with x = 50. This increase in the income level corresponds to a shift outwards in the new budget line to a position represented by the blue budget line in the diagram.

In the sense that the original bundle A is affordable again (so purchasing power has remained constant), the consumer is now as well off as before, but the original bundle A is no longer the utility-maximising one at the new price and the higher income level. The consumer will want to adjust the bundle until the utility function is maximised at the new price and new income. The amount of x the consumer will actually demand at the new price and new income level will be

x(p_1^{*}, m^{*}) = \frac{1500}{(2)(20)} = 37.5

This corresponds to the amount of x in the bundle B in the diagram above, and is usually referred to in the literature as the compensated demand for x (as opposed to the uncompensated demand at point A). The pure substitution effect of the price rise (i.e., abstracting from the income effect) is then the change in demand for x when the price of x changes to p_1^{*} and at the same time the income level changes to m^{*} to keep the consumer’s purchasing power constant:

x(p_1^{*}, m^{*}) - x(p_1, m) = 37.5 - 50 = -12.5

This is the change in the amount of x represented by the shift from bundle A to bundle B in the diagram above.

In this numerical example, the pure substitution effect of the price rise accounts for exactly half of the total drop in the demand for x from 50 at point A to 25 at point C. The other half of the drop in the demand for x is accounted for by the income effect (sometimes called the `wealth’ effect) of the price rise, which is represented in the diagram above by a parallel shift inwards of the blue budget line to the position of the final budget line on which bundle C lies. This is the change in demand for x when we change the income level from m^{*} back to m, holding the price of x fixed at p_1^{*}. Thus, the income effect is computed as

x(p_1^{*}, m) - x(p_1^{*}, m^{*}) = 25 - 37.5 = -12.5

The substitution effect plus the income effect together account for the full drop in the demand for x as a result of moving from bundle A to bundle C in response to the price rise of x.

In this simple numerical example Slutsky’s decomposition equation takes the discrete form

x(p_1^{*}, m) - x(p_1, m) = \big \{ x(p_1^{*}, m^{*}) - x(p_1, m) \big \} + \big\{ x(p_1^{*} , m) - x(p_1^{*}, m^{*}) \big \}

As we pass to the limit of an infinitesimally small price change, Slutsky’s decomposition equation takes the form of a partial differential equation which can be derived for the simple numerical example above (involving the Cobb-Douglas utility function) as follows. Let x^A and y^A be the amounts of x and y consumed at point A in the diagram above, and let

x^s(p_1^{*}, p_2, m^{*}) \equiv x(p_1^{*}, p_2, p_1^{*}x^A + p_2y^A)

be the compensated demand for x at point B in the diagram above, i.e., the demand for x when the consumer is compensated for the rise in the price of x by a rise in income enabling the purchase of the original consumption bundle at A. Partially differentiating both sides of this identity with respect to p_1^{*} we get

\frac{\partial x^s(p_1^{*}, \  p_2, \  m^{*})}{\partial p_1^{*}} = \frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial p_1^{*}} + \frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial m^{*}}\cdot x^A

Rearranging this gives the partial differential equation form of Slutsky’s decomposition equation:

\frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial p_1^{*}} = \frac{\partial x^s(p_1^{*}, \  p_2,  \  m^{*})}{\partial p_1^{*}} - \frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial m^{*}}\cdot x^A

As before, this says that the total effect of a price change is composed of a substitution effect (with income adjusted to maintain the initial purchasing power) and an income effect.

Note that the compensated demand for good x at point B, namely x^s(p_1^{*}, \  p_2, \  m^{*}), can be obtained as the solution to the problem of minimising the expenditure required to achieve the level of utility associated with bundle A in the diagram given the prices p_1^{*} and p_2. Explicitly, we seek to minimise

E = p_1^{*}x + p_2 y

subject to the condition

x^{1/2}y^{1/2} = u(x^A, y^A)

From the constraint we have

y = \frac{u^2(x^A, \  y^A)}{x}

Substituting this into the objective function, the problem reduces to that of minimising

E = p_1^{*}x + p_2\frac{u^2(x^A, \  y^A)}{x}

Taking the first derivative and setting it equal to zero we get

\frac{\text{d}E}{\text{d}x} = p_1^{*} - p_2\frac{u^2(x^A, \   y^A)}{x^2} = 0

so

x = \frac{(p_2)^{1/2}}{(p_1^{*})^{1/2}}u(x^A, \  y^A)

This is the required compensated demand function.

Demand functions obtained in this way are called Hicksian  demand functions and are usually denoted by the letter h. Thus, we have

x^s(p_1^{*}, p_2, m^{*}) = h(p_1^{*}, p_2, u(x^A, \  y^A)) = \frac{(p_1^{*})^{1/2}}{(p_2)^{1/2}}u(x^A, \  y^A)

and we can write Slutsky’s decomposition equation in the more usual form encountered in the literature (using the Hicksian demand function) as

\frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial p_1^{*}} = \frac{\partial h(p_1^{*}, \  p_2,  \  u(x^A, \  y^A))}{\partial p_1^{*}} - \frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial m^{*}}\cdot x^A

To check that this partial differential equation works in the context of our Cobb-Douglas example above, we can compute the partial derivatives explicitly. Since

x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A) = \frac{m^{*}}{2p_1^{*}}

we have

\frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial p_1^{*}} = - \frac{m^{*}}{2(p_1^{*})^2}

We also have

\frac{\partial h(p_1^{*}, \  p_2,  \  u(x^A, \  y^A))}{\partial p_1^{*}} = -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1^{*})^{3/2}} u(x^A, \ y^A)

= -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1^{*})^{3/2}}\big(\frac{m^{*}}{2p_1^{*}}\big)^{1/2}\big(\frac{m^{*}}{2p_2}\big)^{1/2}

= -\frac{m^{*}}{4(p_1^{*})^2}

and

-\frac{\partial x(p_1^{*}, \  p_2, \  p_1^{*}x^A + p_2y^A)}{\partial m^{*}}\cdot x^A = -\frac{1}{2p_1^{*}}\frac{m^{*}}{2p_1^{*}} = -\frac{m^{*}}{4(p_1^{*})^2}

Putting these into the partial differential form of Slutsky’s equation we see that the equation is satisfied.

A problem involving the use of exterior derivatives of differential forms to re-express the classical gradient, curl and divergence operations

CartanModern differential geometry makes extensive use of differential forms and the concept of exterior derivatives of differential forms developed by the French mathematician Élie Cartan (1869-1951). A Wikipedia article about exterior derivatives of differential forms can be found here. As alluded to in this article, exterior derivatives of differential forms encompass a lot of results usually expressed in terms of vector fields in classical vector calculus. In particular, there is a duality between 1-forms, 2-forms and vector fields which allows the classical gradient, curl and divergence operations of vector calculus to be fully subsumed within the realm of exterior derivatives. In the present note I want to briefly explore how these three differentiation operations of vector calculus can be replaced with Cartan’s exterior derivative. The necessary notation and motivation for this are nicely encapsulated in the following problem which appears in Barrett O’Neill’s Elementary Differential Geometry book (Revised Second Edition, p.33):

diffgeomproblem

This problem was also the subject of an interesting Mathematics Stack Exchange discussion which can be found here. The reader should attempt to solve this problem by himself/herself before reading my solution below.

To solve part (a), we use the fact that if f is a differentiable real-valued function on \mathbb{R}^3 and \bold{v}_p is a tangent vector with point of application \bold{p} and vector part \bold{v}, then the differential df of f is the 1-form such that

df(\bold{v}_p) = \sum v_i \frac{\partial f}{\partial x_i}(\bold{p}) = \sum \frac{\partial f}{\partial x_i}(\bold{p}) dx_i(\bold{v}_p)

(where the last equality uses the fact that the differentials of the natural coordinate functions evaluated at a tangent vector are equal to the the coordinates v_i of the vector part of the tangent vector). But using the correspondence (1) between 1-forms and vector fields in the problem we can then write

df(\bold{v}_p) = \sum \frac{\partial f}{\partial x_i}(\bold{p}) dx_i(\bold{v}_p) \stackrel{\mathrm{(1)}}{\longleftrightarrow} \sum \frac{\partial f}{\partial x_i}(\bold{p}) U_i(\bold{p}) = \text{grad } f(\bold{p})

(where the U_i(\bold{p}) are the natural frame field vectors at the point of application \bold{p}). Therefore we have shown that

df \stackrel{\mathrm{(1)}}{\longleftrightarrow} \text{grad } f

I emphasised a specific tangent vector argument \bold{v}_p in the above solution but I will not do this in the solutions for (b) and (c) as the notation becomes too cumbersome. To solve part (b), we consider the 1-form

\phi = f_1 dx_1 + f_2 dx_2 + f_3 dx_3

The exterior derivative of \phi is the 2-form

d \phi = df_1 \wedge dx_1 + df_2 \wedge dx_2 + df_3 \wedge dx_3

=

\big(\frac{\partial f_1}{\partial x_1} dx_1 + \frac{\partial f_1}{\partial x_2} dx_2 + \frac{\partial f_1}{\partial x_3} dx_3 \big) \wedge dx_1

+ \big(\frac{\partial f_2}{\partial x_1} dx_1 + \frac{\partial f_2}{\partial x_2} dx_2 + \frac{\partial f_2}{\partial x_3} dx_3 \big) \wedge dx_2

+ \big(\frac{\partial f_3}{\partial x_1} dx_1 + \frac{\partial f_3}{\partial x_2} dx_2 + \frac{\partial f_3}{\partial x_3} dx_3 \big) \wedge dx_3

=

-\frac{\partial f_1}{\partial x_2} dx_1 dx_2 - \frac{\partial f_1}{\partial x_3} dx_1 dx_3

+ \frac{\partial f_2}{\partial x_1} dx_1 dx_2 - \frac{\partial f_2}{\partial x_3} dx_2 dx_3

+ \frac{\partial f_3}{\partial x_1} dx_1 dx_3 + \frac{\partial f_3}{\partial x_2} dx_2 dx_3

= \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) dx_1 dx_2 + \big( \frac{\partial f_3}{\partial x_1} - \frac{\partial f_1}{\partial x_3} \big) dx_1 dx_3 + \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) dx_2 dx_3

But using the correspondence (2) between 2-forms and vector fields in the problem we can then write

d \phi = \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) dx_1 dx_2 + \big( \frac{\partial f_3}{\partial x_1} - \frac{\partial f_1}{\partial x_3} \big) dx_1 dx_3 + \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) dx_2 dx_3

\stackrel{\mathrm{(2)}}{\longleftrightarrow}

= \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) U_1 + \big( \frac{\partial f_1}{\partial x_3} - \frac{\partial f_3}{\partial x_1} \big) U_2 + \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) U_3

= \text{curl } V

Therefore we have shown that

d \phi \stackrel{\mathrm{(2)}}{\longleftrightarrow} \text{curl } V

Finally, to solve part (c) we can consider the 2-form

\eta = f_1 dydz + f_2 dx dz + f_3 dx dy

which has a correspondence with the vector field V = \sum f_i U_i of the type (1) in the problem, that is,

\eta \stackrel{\mathrm{(1)}}{\longleftrightarrow} V

The exterior derivative of \eta is the 3-form

d \eta = df_1 \wedge dy dz + df_2 \wedge dx dz + df_3 \wedge dx dy

Since products of differentials containing the same differential twice are eliminated, we see immediately that this reduces to

d \eta = \big( \frac{\partial f_1}{\partial x} dx \big) dy dz + \big( \frac{\partial f_2}{\partial y} dy \big) dx dz + \big( \frac{\partial f_3}{\partial z} dz \big) dx dy

= \big(\frac{\partial f_1}{\partial x} + \frac{\partial f_2}{\partial y} + \frac{\partial f_3}{\partial z} \big) dx dy dz

= (\text{div } V) dx dy dz

 

 

 

 

 

Proof that a Gauss sum associated with a quadratic character mod p is the same as a quadratic Gauss sum

In this note I want to quickly record a solution I have found to the following problem: If p is an odd prime such that p \nmid n and \chi(r) = (r|p), prove that

G(n, \chi) = \sum_{r \text{mod } p} (r|p) e^{2 \pi i n r/p} = \sum_{r = 1}^p e^{2 \pi i n r^2/p} = G(n ; p)

My solution is as follows. We have

G(n, \chi) = \sum_{r \text{mod } p} (r|p) e^{2 \pi i n r/p}

= \sum_{\substack{\text{quadratic}\\\text{residues }r}} e^{2 \pi i n r/p} - \sum_{\substack{\text{quadratic}\\\text{non-residues }s}} e^{2 \pi i n s/p}

whereas

G(n ; p) = \sum_{r = 1}^p e^{2 \pi i n r^2/p}

= \sum_{r = 1}^{p-1} e^{2 \pi i n r^2/p} + 1

= 2\sum_{\substack{\text{quadratic}\\\text{residues }r}} e^{2 \pi i n r/p} + 1

(since r mod p will range over both quadratic residues and non-residues, so r^2 will range over the quadratic residues twice)

= \sum_{\substack{\text{quadratic}\\\text{residues }r}} e^{2 \pi i n r/p}

+ \big \{ \sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + 1 \big \}

Therefore G(n, \chi) = G(n ;  p) if and only if

- \sum_{\substack{\text{quadratic}\\\text{non-residues }s}} e^{2 \pi i n s/p}

= \sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + 1

\iff

\sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + \sum_{\substack{\text{quadratic}\\ \text{non-residues }s}} e^{2 \pi i n s/p} = -1

But this is true because

\sum_{\substack{\text{quadratic}\\ \text{residues }r}} e^{2 \pi i n r/p} + \sum_{\substack{\text{quadratic}\\ \text{non-residues }s}} e^{2 \pi i n s/p} = \sum_{r=1}^{p-1} e^{2 \pi inr/p}

This is a geometric sum of the form

S = \sum_{r=1}^{p-1} x^r = x + x^2 + \cdots + x^{p-1}

where

x = e^{2 \pi in/p}

Therefore

xS = x^2 + x^3 + \cdots + x^p

Subtracting the expression for S from this we get

(x - 1)S = x^p - x = 1 - x

(the last equality is true because x is a p-th root of unity)

\iff

S = -1

Therefore G(n, \chi) = G(n ; p).

On the classification of singularities, with an application to non-rotating black holes

singularity In mathematics a singularity is a point at which a mathematical object (e.g., a function) is not defined or behaves `badly’ in some way. Singularities can be isolated (e.g., removable singularities, poles and essential singularities) or nonisolated (e.g., branch cuts). For teaching purposes, I want to delve into some of the mathematical aspects of isolated singularities in this note using simple examples involving the complex sine function. I will not consider nonisolated singularities in detail. These are briefly discussed with some examples in this Wikipedia page. I will also briefly look at how singularities arise in the context of black hole physics in a short final section.

punctured

Definition: A function f has an isolated singularity at the point \alpha if f is analytic on a punctured open disc \{z: 0 < |z - \alpha| < r \}, where r > 0, but not at \alpha itself.

Note that a function f is analytic at a point \alpha if it is differentiable on a region containing \alpha. Strangely, a function can have a derivative at a point without being analytic there. For example, the function f(z) = |z|^2 has a derivative at z = 0 but at no other point, as can easily be verified using the Cauchy-Riemann equations. Therefore this function is not analytic at z = 0. Also note with regard to the definition of an isolated singularity that the function MUST be analytic on the `whole’ of the punctured open disc for the singularity to be defined. For example, despite appearances, the function

f(z) = \frac{1}{\sqrt{z}}

does not have a singularity at z = 0 because it is impossible to define a punctured open disc centred at 0 on which f(z) is analytic (the function z \rightarrow \sqrt{z} is discontinuous everywhere on the negative real axis, so f(z) fails to be analytic there).

I find it appealing that all three types of isolated singularity (removable, poles and essential singularities) can be illustrated by using members of the following family of functions:

f(z) = \frac{\sin(z^m)}{z^n}

where m, n \in \mathbb{N}. For example, if m = n = 1 we get

f_1(z) = \frac{\sin(z)}{z}

which has a removable singularity at z = 0. If m = 1, n = 3 we get

f_2(z) = \frac{\sin(z)}{z^3}

which has a pole of order 2 at z = 0. Finally, if m = -1, n = 0 we get

f_3(z) = \sin\big( \frac{1}{z} \big)

which has an essential singularity at z = 0. In each of these three cases, the function is not analytic at z = 0 but is analytic on a punctured open disc with centre 0, e.g., \{z: 0 < |z| < 1\} or indeed \mathbb{C} - \{0\} (which can be thought of as a punctured disc with infinite radius). In what follows I will use these three examples to delve into structural definitions of the three types of singularity. I will then explore their classification using Laurent series expansions.

Structural definitions of isolated singularities

Removable singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a removable singularity at \alpha if there is a function g which is analytic at \alpha such that

f(z) = g(z) for 0 < |z - \alpha| < r

We can see that g extends the analyticity of f to include \alpha, so we say that g is an analytic extension of f to the circle

\{z: |z - \alpha| < r \}

With removable singularities we always have that \lim_{z \rightarrow \alpha} f(z) exists since

\lim_{z \rightarrow \alpha} f(z) = g(\alpha)

(this will not be true for the other types of singularity) and the name of this singularity comes from the fact that we can effectively `remove’ the singularity by defining f(\alpha) = g(\alpha).

To apply this to the function

f_1(z) = \frac{\sin(z)}{z}

we first observe that the Maclaurin series expansion of \sin(z) is

\sin(z) = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \frac{z^7}{7!} + \cdots for z \in \mathbb{C}

Therefore we can write

f_1(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C} - \{0\}

If we then set

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

we see that g(z) extends the analyticity of f_1(z) to include z = 0. We also see that

\lim_{z \rightarrow 0} f_1(z) = g(0)

Therefore f_1(z) has a removable singularity at z = 0.

Poles of order k, k > 0

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a pole of order k at \alpha if there is a function g, analytic at \alpha with g(\alpha) \neq 0, such that

f(z) = \frac{g(z)}{(z - \alpha)^k} for 0 < |z - \alpha| < r

With poles of order k we always have that

f(z) \rightarrow \infty as z \rightarrow \alpha

(which distinguishes them from removable singularities)

and

\lim_{z \rightarrow \alpha} (z - \alpha)^k f(z)

exists and is nonzero (since \lim_{z \rightarrow \alpha} (z - \alpha)^k f(z) = g(\alpha) \neq 0).

To apply this to the function

f_2(z) = \frac{\sin(z)}{z^3}

we first observe that

f_2(z) = \frac{\sin(z)/z}{z^2} = \frac{g(z)}{z^2} for z \in \mathbb{C} - \{0\}

where g is the function

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

Since g(0) = 1 > 0, we see that f_2(z) behaves like \frac{1}{z^2} near z = 0 and

f_2(z) \rightarrow \infty as z \rightarrow 0

so the singularity at z = 0 is not removable. We also see that

\lim_{z \rightarrow 0} z ^2 f_2(z) = g(0) = 1

Therefore the function f_2(z) has a pole of order 2 at z = 0.

Essential singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has an essential singularity at \alpha if the singularity is neither removable nor a pole. Such a singularity cannot be removed in any way, including by mutiplying by any (z - \alpha)^k, hence the name.

With essential singularities we have that

\lim_{z \rightarrow \alpha} f(z)

does not exist, and f(z) does not tend to infinity as z \rightarrow \alpha.

To apply this to the function

f_3(z) = \sin\big( \frac{1}{z}\big)

we observe that if we restrict the function to the real axis and consider a sequence of points

z_n = \frac{2}{(2n + 1) \pi}

then we have that z_n \rightarrow 0 whereas

f_3(z_n) = \sin\big(\frac{(2n + 1) \pi}{2}\big) = (-1)^n

Therefore

\lim_{z \rightarrow 0} f_3(z)

does not exist, so the singularity is not removable, but it is also the case that

\lim_{z \rightarrow 0} f_3(z) \not \rightarrow \infty

so the singularity is not a pole. Since it is neither a removable singularity nor a pole, it must be an essential singularity.

Classification of isolated singularities using Laurent series

By Laurent’s Theorem, a function f which is analytic on an open annulus

A = \{z: 0 \leq r_1 < |z - \alpha| < r_2 \leq \infty \}

annulus

(shown in the diagram) can be represented as an extended power series of the form

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

= \cdots + \frac{a_{-2}}{(z - \alpha)^2} + \frac{a_{-1}}{(z - \alpha)} + a_0 + a_1 (z - \alpha) + a_2 (z - \alpha)^2 + \cdots

for z \in A, which converges at all points in the annulus. It is an `extended’ power series because it involves negative powers of (z - \alpha). (The part of the power series involving negative powers is often referred to as the singular part. The part involving non-negative powers is referred to as the analytic part). This extended power series representation is the Laurent series about \alpha for the function f on the annulus A. Laurent series are also often used in the case when A is a punctured open disc, in which case we refer to the series as the Laurent series about \alpha for the function f.

The Laurent series representation of a function on an annulus A is unique. We can often use simple procedures, such as finding ordinary Maclaurin or Taylor series expansions, to obtain an extended power series and we can feel safe in the knowledge that the power series thus obtained must be the Laurent series.

Laurent series expansions can be used to classify singularities by virtue of the following result: If a function f has a singularity at \alpha and if its Laurent series expansion about \alpha is

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

then

(a) f has a removable singularity at \alpha iff a_n = 0 for all n < 0;

(b) f has a pole of order k at \alpha iff a_n = 0 for all n < -k and a_{-k} \neq 0;

(c) f has an essential singularity at \alpha iff a_n \neq 0 for infinitely many n < 0.

To apply this to our three examples, observe that the function

f_1(z) = \frac{\sin(z)}{z}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z} = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots

for z \in \mathbb{C} - \{0\}. This has no non-zero coefficients in its singular part (i.e., it only has an analytic part) so the singularity is a removable one.

The function

f_2(z) = \frac{\sin(z)}{z^3}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z^3} = \frac{1}{z^2} - \frac{1}{3!} + \frac{z^2}{5!} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n = 0 for all n < -2 and a_{-2} \neq 0, so the singularity in this case is a pole of order 2.

Finally, the function

f_3(z) = \sin\big( \frac{1}{z} \big)

has a singularity at 0 and its Laurent series expansion about 0 is

\sin \big(\frac{1}{z} \big) = \frac{1}{z} - \frac{1}{3! z^3} + \frac{1}{5! z^5} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n \neq 0 for infinitely many n < 0 so the singularity here is an essential singularity.

Singularities in Schwarzschild black holes

One often hears about singularities in the context of black hole physics and I wanted to quickly look at singularities in the particular case of non-rotating black holes. A detailed investigation of the various singularities that appear in exact solutions of Einstein’s field equations was conducted in the 1960s and 1970s by Penrose, Hawking, Geroch and others. See, e.g., this paper by Penrose and Hawking. There is now a vast literature on this topic. The following discussion is just my own quick look at how the ideas might arise.

The spacetime of a non-rotating spherical black hole is usually analysed using the Schwarzschild solution of the Einstein field equations for an isolated spherical mass m. In spherical coordinates this is the metric

\Delta \tau = \bigg[ \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2} \bigg\{\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)} + r^2(\Delta \theta)^2 + r^2 \sin^2 \theta (\Delta \phi)^2\bigg\} \bigg]^{1/2}

where

k = \frac{2mG}{c^2} and m is the mass of the spherically symmetric static object exterior to which the Schwarzschild metric applies. If we consider only radial motion (i.e., world lines for which \Delta \theta = \Delta \phi = 0) the Schwarzschild metric simplifies to

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2}\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)}

We can see that the \Delta r term in the metric becomes infinite at r = k so there is apparently a singularity here. However, this singularity is `removable’ by re-expressing the metric in a new set of coordinates, r and t^{\prime}, known as the Eddington-Finkelstein coordinates. The transformed metric has the form

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t^{\prime})^2 - \frac{2k \Delta t^{\prime} \Delta r}{cr} - \frac{(\Delta r)^2}{c^2}\big(1 + \frac{k}{r}\big)

which does not behave badly at r = k. In general relativity, this type of removable singularity is known as a coordinate singularity. Another example is the apparent singularity at the 90^{\circ} latitude in spherical coordinates, which disappears when a different coordinate system is used.

Since the term \big(1 - \frac{k}{r} \big) in the Schwarzschild metric becomes infinite at r = 0, it appears that we also have a singularity at this point. This is not a removable singularity and can in fact be recognised in terms of the earlier discussion above as a pole of order 1 (also called a simple pole).

 

Different possible branch cuts for the principal argument, principal logarithm and principal square root functions

BranchCutFor some work I was doing with a student, I was trying to find different ways of proving the familiar result that the complex square root function f(z) = \sqrt{z} is discontinuous everywhere on the negative real axis. As I was working on alternative proofs it became very clear to me how `sensitive’ all the proofs were to the particular definition of the principal argument I was using, namely that the principal argument \theta = \text{Arg}z is the unique argument of z satisfying -\pi < \theta \leq \pi. In a sense, this definition `manufactures’ the discontinuity of the complex square root function on the negative real axis, because the principal argument function itself is discontinuous here: the principal argument of a sequence of points approaching the negative real axis from above will tend to \pi, whereas the principal argument of a sequence approaching the same point on the negative real axis from below will tend to -\pi. I realised that all the proofs I was coming up with were exploiting this discontinuity of the principal argument function. However, this particular choice of principal argument function is completely arbitrary. An alternative could be to say that the principal argument of z is the unique argument satisfying 0 \leq \theta < 2\pi which we can call \text{Arg}_{2\pi} z. The effect of this choice of principal argument function is to make the complex square root function discontinuous everywhere on the positive real axis! It turns out that we can choose an infinite number of different lines to be lines of discontinuity for the complex square root function, simply by choosing different definitions of the principal argument function. The same applies to the complex logarithm function. In this note I want to record some of my thoughts about this.

The reason for having to specify principal argument functions in the first place is that we need to make complex functions of complex variables single-valued rather than multiple-valued, to make them well-behaved with regard to operations like differentiation. Specifying a principal argument function in order to make a particular complex function single-valued is called choosing a branch of the function. If we specify the principal argument function to be f(z) = \text{Arg} z where -\pi < \text{Arg} z \leq \pi then we define the principal branch of the logarithm function to be

\text{Log} z = \text{log}_e |z| + i \text{Arg} z

for z \in \mathbb{C} - \{0\}, and the principal branch of the square root function to be

z^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z \big)

for z \in \mathbb{C} with z \neq 0.

If we define the functions \text{Log} z and z^{\frac{1}{2}} in this way they will be single-valued, but the cost of doing this is that they will not be continuous on the whole of the complex plane (essentially because of the discontinuity of the principal argument function, which both functions `inherit’). They will be discontinuous everywhere on the negative real axis. The negative real axis is known as a branch cut for these functions. Using this terminology, what I want to explore in this short note is the fact that different choices of branch for these functions will result in different branch cuts for them.

To begin with, let’s formally prove the discontinuity of the principal argument function f(z) = \text{Arg} z, z \neq 0, and then see how this discontinuity is `inherited’ by the principal logarithm and square root functions. For the purposes of the proof we can consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x < 0 \}

Clearly, as n \rightarrow \infty, we have z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = \text{Arg} \big( |\alpha| \text{e}^{(-\pi + 1/n)i}\big)

= -\pi + \frac{1}{n}

\rightarrow -\pi

whereas

f(\alpha) = \text{Arg}\big(|\alpha| \text{e}^{\pi i} \big) = \pi

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the negative real axis.

Now consider how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the negative real axis depends crucially on the discontinuity of \text{Arg} z. We again consider the sequence of points

z_n = |\alpha| \text{e}^{(-\pi + 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x < 0 \}

so that z_n \rightarrow -|\alpha| = \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (- \pi + \frac{1}{n}) \big)

\rightarrow |\alpha|^{\frac{1}{2}} \text{e}^{-i \pi /2} = - i |\alpha|^{\frac{1}{2}}

whereas

f(\alpha) = \big( |\alpha| \text{e}^{i \pi}\big)^{\frac{1}{2}}

= |\alpha|^{\frac{1}{2}} \text{e}^{i \pi/2} = i |\alpha|^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the negative real axis.

Now suppose we choose a different branch for the principal logarithm and square root functions, say \text{Arg}_{2\pi} z which as we said earlier satisfies 0 \leq \text{Arg}_{2\pi} z < 2\pi. The effect of this is to change the branch cut of these functions to the positive real axis! The reason is that the principal argument function will now be discontinuous everywhere on the positive real axis, and this discontinuity will again be `inherited’ by the principal logarithm and square root functions.

To prove the discontinuity of the principal argument function f(z) = \text{Arg}_{2\pi} z on the positive real axis we can consider the sequence of points

z_n = \alpha \text{e}^{(2 \pi - 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x > 0 \}

We have z_n \rightarrow \alpha. However,

f(z_n) = \text{Arg}_{2\pi} \big(\alpha \text{e}^{(2\pi - 1/n)i}\big)

= 2\pi - \frac{1}{n}

\rightarrow 2\pi

whereas

f(\alpha) = \text{Arg}_{2\pi}(\alpha) = 0

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal argument function is discontinuous at all points on the positive real axis.

We can now again see how the following proof of the discontinuity of f(z) = z^{\frac{1}{2}} on the positive real axis depends crucially on the discontinuity of \text{Arg}_{2\pi} z there. We again consider the sequence of points

z_n = \alpha \text{e}^{(2\pi - 1/n)i}

where

\alpha \in \{ x \in \mathbb{R}: x > 0 \}

so that z_n \rightarrow \alpha. However,

f(z_n) = z_n^{\frac{1}{2}} = \text{exp}\big(\frac{1}{2} \text{Log} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |z_n| + \frac{1}{2} i \text{Arg}_{2\pi} z_n \big)

= \text{exp}\big( \frac{1}{2} \text{log}_e |\alpha| + \frac{1}{2} i (2 \pi - \frac{1}{n}) \big)

\rightarrow \alpha^{\frac{1}{2}} \text{e}^{i 2 \pi /2} = - \alpha^{\frac{1}{2}}

whereas

f(\alpha) = \alpha^{\frac{1}{2}}

Therefore f(z_n) \not \rightarrow f(\alpha), so the principal square root function is discontinuous at all points on the positive real axis.

There are infinitely many other branches to choose from. In general, if \tau is any real number, we can define the principal argument function to be f(z) = \text{Arg}_{\tau} z where

\tau \leq \text{Arg}_{\tau} < \tau + 2\pi

and this will give rise to a branch cut for the principal logarithm and square root functions consisting of a line emanating from the origin and containing all those points z such that \text{arg}(z) = \tau modulo 2\pi.

Advanced Number Theory Note #16: A proof of the law of quadratic reciprocity using Gauss sums and quadratic characters

Gauss The prince of mathematicians (princeps mathematicorum), Carl Friedrich Gauss, arguably the greatest mathematician who ever lived, devoted a lot of attention to exploring alternative proofs of the law of quadratic reciprocity. As I mentioned in a previous note, this is actually a very deep result which has had a profound impact on modern mathematics. A rather good Wikipedia page about the quadratic reciprocity law has a section entitled connection with cyclotomy which makes clear its importance to the development of modern class field theory, and a history and alternative statements section catalogues its somewhat convoluted history.

In the present note I want to explore in detail one of the (many) approaches to proving the law of quadratic reciprocity, an approach which uses Gauss sums and Legendre symbols. (In a later note I will explore another proof using quadratic Gauss sums which involves contour integration techniques from complex analysis).

The proof consists of three key theorems, as follows:

Theorem I. This proves that G(1, \chi)^2 = (-1|p)p when \chi = (r|p).

Theorem II. This proves that G(1, \chi)^{q-1} \equiv (q|p) (mod q) is equivalent to the law of quadratic reciprocity, using Theorem I.

Theorem III. This proves an identity for G(1, \chi)^{q-1} from which the congruence in Theorem II follows, thus completing the overall proof of the quadratic reciprocity law.

In a previous note I stated a version of the law of quadratic reciprocity due to Legendre as follows: if p and q are distinct odd primes then

(p|q) = \begin{cases} (q|p),& \text{if either } p \equiv 1 \ \text{(mod 4) } \text{or } q \equiv 1 \ \text{(mod 4)}\\ -(q|p), & \text{if } p \equiv q \equiv 3 \ \text{(mod 4)} \end{cases}

For the purposes of the proof in the present note it is necessary to express the quadratic reciprocity law as

(q|p) = (-1)^{(p - 1)(q - 1)/4}(p|q)

These two formulations are completely equivalent. To see this, note that if p \equiv 1 (mod 4) or p \equiv 1 (mod 4), the exponent on (-1) in the second formulation reduces to an even integer so we get (q|p) = (p|q). On the other hand, if p \equiv q \equiv 3 (mod 4), the exponent on (-1) in the second formulation reduces to an odd integer, so we get (q|p) = -(p|q).

Also note that the proof makes use of Gauss sums incorporating the Legendre symbol as the Dirichlet character in the summand, i.e., Gauss sums of the form

G(n, \chi) = \sum_{r \text{mod } p} \chi(r) e^{2 \pi i n r/p}

where \chi(r) = (r|p), and the Legendre symbol in this context is referred to as the quadratic character mod p. Since the modulus is prime, the Dirichlet character here is primitive and we have that G(n, \chi) is separable with

G(n, \chi) = \overline{\chi(n)} G(1, \chi) = (n|p) G(1, \chi)

for every n, because either gcd(n, p) = 1, or if gcd(n, p) > 1 we must have p|n in which case G(n, \chi) = 0 because e^{2 \pi i n r/p} = 1 and \sum_{r \text{mod } p} \chi(r) = 0 (for non-principal characters the rows of the character tables sum to zero).

Theorem I. If p is an odd prime and \chi(r) = (r|p) then

G(1, \chi)^2 = (-1|p) p

Proof: We have

G(1, \chi) = \sum_{r = 1}^{p - 1} (r|p) e^{2 \pi i r/p}

and therefore

G(1, \chi)^2 = \sum_{r = 1}^{p - 1} (r|p) e^{2 \pi i r/p} \times \sum_{s = 1}^{p - 1} (s|p) e^{2 \pi i s/p}

= \sum_{r = 1}^{p - 1} \sum_{s = 1}^{p - 1} (r|p) (s|p) e^{2 \pi i (r + s)/p}

For each pair of values of r and s there is a unique t mod p such that

s \equiv tr (mod p)

since this is a linear congruence with a unique solution. We also have that

(r|p) (s|p) = (r|p) (tr|p)

= (r|p) (r|p) (t|p)

= (r^2|p) (t|p)

= (t|p)

Therefore we can write

G(1, \chi)^2 = \sum_{r = 1}^{p - 1} \sum_{tr = 1}^{p - 1} (t|p) e^{2 \pi i r (1 + t)/p}

= \sum_{r = 1}^{p - 1} \sum_{t = 1}^{p - 1} (t|p) e^{2 \pi i r (1 + t)/p}

(where the index in the second summation has been reduced to t since t will range through all the least positive residues of p independently of r)

= \sum_{t = 1}^{p - 1} (t|p) \sum_{r = 1}^{p - 1} e^{2 \pi i r (1 + t)/p}

The last sum on r is a geometric sum of the form

g(1 + t) = \sum_{r = 1}^{p - 1} x^r

where

x = e^{2 \pi i (1 + t)/p}

so we have

g(1 + t) = \begin{cases} \frac{x^p - x}{x - 1} & \text{if } x \neq 1\\ p - 1 & \text{if } x = 1 \end{cases}

But it must be the case that x^p = 1 (because x is a pth root of unity), and we also have that x = 1 if and only if p|(1 + t), so we can write

g(1 + t) = \begin{cases} -1 & \text{if } p \nmid (1 + t) \\ p - 1 & \text{if } p | (1 + t) \end{cases}

Therefore

G(1, \chi)^2 = - \sum_{t = 1}^{p - 2} (t|p) + (p - 1) (p - 1|p)

(because the only value of t for which p|(1 + t) is t = p - 1, so we can pull this out of the summation and then for this value of t the Legendre symbol (t|p) becomes (p - 1|p))

= - \sum_{t = 1}^{p - 2} (t|p) - (p - 1|p) + p(p - 1|p)

= - \sum_{t = 1}^{p - 1} (t|p) + p(p - 1|p)

= - \sum_{t = 1}^{p - 1} (t|p) + p(-1|p)

(since (p - 1|p) = (-1|p))

= (-1|p) p

(because \sum_{t = 1}^{p - 1} (t|p) = 0 since (t|p) is a Dirichlet character mod p and the rows of Dirichlet character tables sum to zero). \square

Since (-1|p) = \pm 1, Theorem I tells us that G(1, \chi)^2 is an integer and it then follows that

G(1, \chi)^{q - 1} = (G(1, \chi)^2)^{(q - 1)/2}

is also an integer for every odd q. It turns out that the law of quadratic reciprocity is intimately related to the value of the integer G(1, \chi)^{q - 1} mod q, which is what the next theorem shows.

Theorem II. Let p and q be distinct odd primes and let \chi be the quadratic character (i.e., the Legendre symbol) mod p. Then the quadratic reciprocity law

(q|p) = (-1)^{(p - 1)(q - 1)/4}(p|q)

is equivalent to the congruence

G(1, \chi)^{q - 1} \equiv (q|p) (mod q)

Proof: From the result proved in Theorem I we have that

G(1, \chi)^{q - 1} = (G(1, \chi)^2)^{(q - 1)/2}

= (-1|p)^{(q - 1)/2} p^{(q - 1)/2}

= (-1)^{(p - 1)(q - 1)/4} p^{(q - 1)/2}

where the last equality follows from property (e) of Legendre symbols in my previous note which implies

(-1|p) = (-1)^{(p - 1)/2}

By property (d) of Legendre symbols we also have

p^{(q - 1)/2} \equiv (p|q) (mod q)

so we can write

G(1, \chi)^{q - 1} \equiv (-1)^{(p - 1)(q - 1)/4} (p|q) = (q|p) (mod q)

where the last equality follows from the law of quadratic reciprocity.

Therefore if the law of quadratic reciprocity holds, then so does

G(1, \chi)^{q - 1} \equiv (q|p) (mod q)

and vice versa. \square

The last stage of the proof is now to deduce the congruence in Theorem II from an identity for G(1, \chi)^{q - 1} which is established in the next theorem.

Theorem III. If p and q are distinct odd primes and if \chi is the quadratic character (i.e., Legendre symbol) mod p, we have

G(1, \chi)^{q - 1} = (q|p) \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv q (mod p)

Proof: It is easy to show that the Gauss sum G(n, \chi) is a periodic function of n with period p since

G(n + p, \chi) = \sum_{m = 1}^{p} \chi(m) e^{2 \pi i m (n + p)/p}

= \sum_{m = 1}^{p} \chi(m) e^{2 \pi i m n/p} e^{2 \pi i m}

= \sum_{m = 1}^{p} \chi(m) e^{2 \pi i m n/p} = G(n, \chi)

Since therefore

G(n, \chi)^q = G(n + p, \chi)^q

it follows that G(n, \chi)^q is also a periodic function of n with period p. Therefore we have a finite Fourier expansion

G(n, \chi)^q = \sum_{m \ mod \ p} a_q(m) e^{2 \pi i m n/p}

where the coefficients are given by

a_q(m) = \frac{1}{p} \sum_{n \ mod \ p} G(n, \chi)^q e^{-2 \pi i m n/p}

(see my previous note on finding the finite Fourier expansion of an arithmetical function). Simply from the general definition of G(n, \chi) (using Legendre symbols as Dirichlet characters) we have

G(n, \chi)^q = \sum_{r_1 mod \ p} (r_1|p) e^{2 \pi i n r_1/p} \cdots \sum_{r_q mod \ p} (r_q|p) e^{2 \pi i n r_q/p}

= \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p) e^{2 \pi i n (r_1 + \cdots + r_q)/p}

so we can write the above Fourier expansion coefficients as

a_q(m) = \frac{1}{p} \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p) \times \sum_{n \ mod \ p} e^{2 \pi i n (r_1 + \cdots + r_q - m)/p}

The sum on n is a geometric sum of the form

g(r_1 + \cdots + r_q - m) = \sum_{n = 0}^{p - 1} x^n

where

x = e^{2 \pi i (r_1 + \cdots + r_q - m)/p}

so we have

g(r_1 + \cdots + r_q - m) = \begin{cases} \frac{x^p - 1}{x - 1} & \text{if } x \neq 1\\ p & \text{if } x = 1 \end{cases}

= \begin{cases} 0 & \text{if } x \neq 1\\ p & \text{if } x = 1 \end{cases}

(since x^p = 1 because x is a pth root of unity). Therefore in the expression for a_q(m) the sum on n vanishes unless r_1 + \cdots + r_q \equiv m (mod p), in which case the sum is equal to p. Therefore we can write

a_q(m) = \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv m (mod p)

Now we return to the original expression for a_q(m), namely

a_q(m) = \frac{1}{p} \sum_{n \ mod \ p} G(n, \chi)^q e^{-2 \pi i m n/p}

and use this to obtain a different expression for a_q(m). The separability of G(n, \chi) means that

G(n, \chi) = (n|p) G(1, \chi)

We also have the result for odd q that

(n|p)^q = (n^q|p) = (n^{q-1}|p) (n|p) = (n|p)

(since q - 1 is even). Therefore we find

a_q(m) = \frac{1}{p} G(1, \chi)^q \sum_{n \ mod \ p} (n|p) e^{-2 \pi i m n/p}

= \frac{1}{p} G(1, \chi)^q G(-m, \chi)

= \frac{1}{p} G(1, \chi)^q (m|p) G(-1, \chi)

= (m|p) G(1, \chi)^{q - 1}

where the last equality follows from the fact that

G(1, \chi) G(-1, \chi) = G(1, \chi) (-1|p) G(1, \chi)

= G(1, \chi)^2 (-1|p)

= (-1|p) p (-1|p)

= ((-1)^2|p) p

= p

Therefore

(m|p) a_q(m) = (m|p) (m|p) G(1, \chi)^{q - 1}

= (m^2|p) G(1, \chi)^{q - 1}

= G(1, \chi)^{q - 1}

Taking m = q and the previously obtained expression for a_q(m) we get the claimed result

G(1, \chi)^{q - 1} = (q|p) \sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv q (mod p). \square

We are now in a position to deduce the law of quadratic reciprocity from Theorems I, II and III. From the result obtained in Theorem II, it suffices to show that

\sum_{r_1 mod \ p} \cdots \sum_{r_q mod \ p} (r_1 \cdots r_q |p) \equiv 1 (mod q)

where the summation indices satisfy the restriction

r_1 + \cdots + r_q \equiv q (mod p)

i.e., every term (r_1 \cdots r_q |p) in the summand satisfies this restriction. One way in which this restriction is satisfied is when

r_j \equiv 1 (mod p)

for j = 1, \dots, q. In this case we have

(r_1 \cdots r_q |p) = (1|p) = 1

Every other possible way of satisfying the restriction involves

r_j \not \equiv r_k (mod p)

for some j \neq k. For each of these ways, every cyclic permutation of r_1, \ldots, r_q satisfying the restriction contributes the same summand (r_1 \cdots r_q |p). Therefore for each of these ways of satisfying the restriction, each summand appears q times and therefore contributes 0 modulo q to the sum. Therefore only the scenario r_j \equiv 1 (mod p) for j = 1, \dots, q yields a non-zero contribution, so the sum is 1 (mod q). This completes the proof of the law of quadratic reciprocity.

To clarify the last point, consider the following

Example: Take p = 5 and q = 3. Then the equations are

\sum_{r_1 \ mod \ 5} \sum_{r_2 \ mod \ 5} \sum_{r_3 \ mod \ 5} (r_1 r_2 r_3|5) \equiv 1 (mod 3)

where the summation indices satisfy the restriction

r_1 + r_2 + r_3 \equiv 3 (mod 5)

In the case when r_1 \equiv r_2 \equiv r_3 \equiv 1 (mod 5) we have

(r_1 r_2 r_3|5) \equiv (1|5) = 1

so the first equation is satisfied.

Suppose we consider any other way of satisfying the restriction, say

r_1 \equiv 1, r_2 \equiv 3, r_3 \equiv 4 (mod 5)

so that

r_1 + r_2 + r_3 \equiv 8 \equiv 3 (mod 5)

Then the cyclic permutations

r_1 \equiv 4, r_2 \equiv 1, r_3 \equiv 3 (mod 5)

and

r_1 \equiv 3, r_2 \equiv 4, r_3 \equiv 1 (mod 5)

will also satisfy the restriction, and these contribute a total of

3 (1 \cdot 3 \cdot 4|5) \equiv 0 (mod 3)

to the sum. Therefore only the first way of satisfying the restriction will contribute to the sum, so the sum must equal 1.

Advanced Number Theory Note #15: The Legendre symbol (a|p) as a Dirichlet character mod p

Legendre The Legendre symbol was introduced by the great 19th Century mathematician Adrien-Marie Legendre (the charicature shown here is the only known contemporary likeness of him). It has proved to be very useful as a shorthand for stating a number’s quadratic character and also in calculations thereof.

If p is an odd prime, then the Legendre symbol (a|p) = 1 if a is a quadratic residue of p, (a|p) = -1 if a is a quadratic non-residue of p, and (a|p) = 0 if a \equiv 0 (mod p).

The Legendre symbol has a number of well known properties which are useful for calculations and are summarised here for convenience:

LegendreProperties

QuadraticCharacterOf2

QuadraticReciprocity

The last property, the law of quadratic reciprocity, is actually a deep result which has been studied in depth and proved in numerous different ways by Gauss and others. Indeed, it was for the purpose of finding his own proof of this result that Legendre invented the Legendre symbol. In a later note I will explore in detail a proof of the law of quadratic reciprocity using Gauss sums and Legendre symbols. This proof hinges on the fact that the Legendre symbol (a|p) is a Dirichlet character mod p. In the present short note I want to quickly show explicitly why this is the case by highlighting three key facts about Legendre symbols:

I. The Legendre symbol (a|p) is a completely multiplicative function of a.

II. The Legendre symbol (a|p) is periodic with period p.

III. The Legendre symbol vanishes when p|a.

Fact III follows immediately from the definition of Legendre symbols, and II is true because we have

a \equiv a +p (mod p)

and therefore (by property (a) above)

(a|p) = (a + p|p)

so the Legendre symbol is periodic with period p.

To prove I, observe that if p|a or p|b then ab \equiv 0 (mod p) so

(ab|p) = (a|p) \cdot (b|p) = 0

since at least one of (a|p) or (b|p) must be zero.

If p \not| a and p \not| b, then p \not| ab and we have (by property (d) above)

(ab|p) \equiv (ab)^{(p-1)/2}

\equiv (a)^{(p-1)/2} \cdot (b)^{(p-1)/2}

\equiv (a|p) \cdot (b|p) (mod p)

Therefore

(ab|p) - (a|p) \cdot (b|p)

is divisible by p, and since this difference cannot actually equal a multiple of p (the terms can only take the values 1 or -1), the difference must be zero. The Legendre symbol is therefore completely multiplicative as claimed in I.

Since (a|p) is a completely multiplicative function of a which is periodic with period p and vanishes when p|a, it follows that (a|p) is a Dirichlet character \chi(a) mod p as claimed.

I will illustrate this with two examples. First, let p = 7. We have

1^2 \equiv 1 (mod 7)

2^2 \equiv 4 (mod 7)

3^2 \equiv 2 (mod 7)

so the quadratic residues of 7 are 1, 2 and 4 and the quadratic non-residues are 3, 5 and 6. The Legendre symbol (a|7) therefore takes the values

(1|7) = 1

(2|7) = 1

(3|7) = -1

(4|7) = 1

(5|7) = -1

(6|7) = -1

These are exactly the values of the fourth character in the Dirichlet character table mod 7:

mod 07

Thus, (a|7) = \chi_4(a) mod 7.

For a second example, let p = 11. We have

1^2 \equiv 1 (mod 11)

2^2 \equiv 4 (mod 11)

3^2 \equiv 9 (mod 11)

4^2 \equiv 5 (mod 11)

5^2 \equiv 3 (mod 11)

so the quadratic residues of 11 are 1, 3 and 4, 5 and 9 and the quadratic non-residues are 2, 6 and 7, 8, and 10. The Legendre symbol (a|11) therefore takes the values

(1|11) = 1

(2|11) = -1

(3|11) = 1

(4|11) = 1

(5|11) = 1

(6|11) = -1

(7|11) = -1

(8|11) = -1

(9|11) = 1

(10|11) = -1

These are exactly the values of the sixth character in the Dirichlet character table mod 11:

mod 11

Thus, (a|11) = \chi_6(a) mod 11.