A note on the structure of the higher-order terms in multivariate Taylor expansions

I recently had to think a bit more deeply than is usually necessary about the nature of the higher-order terms in multivariate Taylor expansions of smooth functions of the form f : \mathbb{R}^m \rightarrow \mathbb{R}. The result was a realisation that the fundamental structure of these higher-order terms is actually quite simple and that working them out manually brings up some interesting combinatorial issues, including use of the partition function from advanced number theory. I want to record some of my thoughts about this here.

For a univariate function f : \mathbb{R} \rightarrow \mathbb{R}, Taylor’s theorem says that if f is n times differentiable at some point a \in \mathbb{R} then there exists a function h_n : \mathbb{R} \rightarrow \mathbb{R} such that

f(x) - f(a) =

f^{\prime}(a)(x - a) + \frac{1}{2!} f^{\prime \prime}(a)(x - a)^2 + \cdots + \frac{1}{n!}f^{(n)}(a)(x - a)^n + h_n(x)(x - a)^n

where h_n(x) \rightarrow 0 as x \rightarrow a. The n-th order Taylor polynomial

f(a) + f^{\prime}(a)(x - a) + \frac{1}{2!} f^{\prime \prime}(a)(x - a)^2 + \cdots + \frac{1}{n!}f^{(n)}(a)(x - a)^n

then serves as an approximation to the function f(x). If the function f is infinitely many times differentiable, we can approximate f to any desired level of accuracy by choosing the Taylor polynomial to be of high enough degree.

In the univariate case the n-th order term in the Taylor polynomial is

\frac{1}{n!}f^{(n)}(a)(x - a)^n

In the present note I am interested in how this generalises to the multivariate case f : \mathbb{R}^m \rightarrow \mathbb{R}, where the Taylor expansion is about a vector \mathbf{a} = (a_1, a_2, \ldots, a_m). In the bivariate case m = 2, where the Taylor expansion is about the point \mathbf{a} = (a_1, a_2), the n-th order term in the Taylor expansion is given by the formula

\frac{1}{n!}\sum_{k=0}^{n} \binom{n}{k} \frac{\partial^k f}{\partial x_1^{n-k} \partial x_2^{k}}\bigg|_{\mathbf{a}} (x_1 - a_1)^{n-k}(x_2 - a_2)^k

This enables any higher-order term in the Taylor polynomial to be quickly obtained (and notice that the binomial coefficient values can be quickly read from the appropriate line of Pascal’s triangle). For example, the fourth-order term would be

\frac{1}{4!} \bigg[\frac{\partial^4f}{\partial x_1^4}\bigg|_{\mathbf{a}} (x_1 - a_1)^4 + 4 \frac{\partial^4 f}{\partial x_1^3 \partial x_2}\bigg|_{\mathbf{a}} (x_1 - a_1)^3 (x_2 - a_2) \ \ +

6 \frac{\partial^4 f}{\partial x_1^2 \partial x_2^2}\bigg|_{\mathbf{a}} (x_1 - a_1)^2 (x_2 - a_2)^2 + 4 \frac{\partial^4 f}{\partial x_1 \partial x_2^3}\bigg|_{\mathbf{a}} (x_1 - a_1) (x_2 - a_2)^3 \  \  +

\frac{\partial^4f}{\partial x_2^4}\bigg|_{\mathbf{a}} (x_2 - a_2)^4\bigg]

In order to generalise this as easily as possible to multivariate cases with m > 2, I found it convenient to adopt the following notation:

\partial_i \equiv \frac{\partial f}{\partial x_i} \bigg|_{\mathbf{a}}

\partial_{ij} \equiv \frac{\partial^2 f}{\partial x_i \partial x_j} \bigg|_{\mathbf{a}}

\partial_{ijk} \equiv \frac{\partial^3 f}{\partial x_i \partial x_j \partial x_k} \bigg|_{\mathbf{a}}

and so on, and

v_i \equiv (x_i - a_i)

Given the function f : \mathbb{R}^m \rightarrow \mathbb{R}, we allow the indices above to take values from 1 to m. To my surprise, I then realised that if we use the Einstein summation convention, it is possible to represent any Taylor expansion as

f(\mathbf{x}) = f(\mathbf{a}) + \partial_i v_i + \frac{1}{2!} \partial_{ij} v_i v_j + \frac{1}{3!} \partial_{ijk} v_i v_j v_k + \frac{1}{4!} \partial_{ijkl} v_i v_j v_k v_l + \cdots

This generic formula will apply to any number of variables m. Notice that the number of indices in each term corresponds to the order of that term. So, for example, the fifth-order term in the Taylor expansion will involve five indices, the sixth-order term will involve six indices, and so on.

To test out this formula, let us use it to work out the fourth-order term for the bivariate case m = 2 again. The relevant term in the above form of the Taylor expansion is

\frac{1}{4!} \partial_{ijkl} v_i v_j v_k v_l

Since each of the indices i, j, k, l will be allowed to take values 1 or 2, the above expression represents a sum involving 2^4 = 16 terms. In general, the n-th order term in the Taylor expansion of a function f : \mathbb{R}^m \rightarrow \mathbb{R} will involve m^n terms. The 16 terms in the above case are

\frac{1}{4!} \bigg[ \partial_{1111} v_1 v_1 v_1 v_1 + \partial_{1112} v_1 v_1 v_1 v_2 + \partial_{1121} v_1 v_1 v_2 v_1 + \partial_{1122} v_1 v_1 v_2 v_2 \ +

\partial_{1211} v_1 v_2 v_1 v_1 + \partial_{1212} v_1 v_2 v_1 v_2 + \partial_{1221} v_1 v_2 v_2 v_1 + \partial_{1222} v_1 v_2 v_2 v_2 \ +

\partial_{2111} v_2 v_1 v_1 v_1 + \partial_{2112} v_2 v_1 v_1 v_2 + \partial_{2121} v_2 v_1 v_2 v_1 + \partial_{2122} v_2 v_1 v_2 v_2 \ +

\partial_{2211} v_2 v_2 v_1 v_1 + \partial_{2212} v_2 v_2 v_1 v_2 + \partial_{2221} v_2 v_2 v_2 v_1 + \partial_{2222} v_2 v_2 v_2 v_2\bigg]

However, due to the equality of cross-partials involving the same index numbers, this reduces to

\frac{1}{4!}\bigg[\partial_{1111} v_1^4 + 4 \partial_{1112} v_1^3 v_2 + 6 \partial_{1122} v_1^2 v_2^2 + 4 \partial_{1222} v_1 v_2^3 + \partial_{2222} v_2^4\bigg]

which matches what we previously obtained using the formula.

In a similar way, we could proceed to systematically calculate the fourth-order term in the trivariate case m = 3. The relevant term in the generic Taylor expansion is the same one as before,  namely

\frac{1}{4!} \partial_{ijkl} v_i v_j v_k v_l

However, since each of the indices will take values from 1 to 3 in the trivariate case, the above expression now represents a sum of 3^4 = 81 terms. The 81 terms are

\frac{1}{4!} \bigg[\partial_{1111} v_1 v_1 v_1 v_1 + \partial_{1121} v_1 v_1 v_2 v_1 + \partial_{1131} v_1 v_1 v_3 v_1 \ +

\partial_{1112} v_1 v_1 v_1 v_2 + \partial_{1122} v_1 v_1 v_2 v_2 + \partial_{1132} v_1 v_1 v_3 v_2 \ +

\partial_{1113} v_1 v_1 v_1 v_3 + \partial_{1123} v_1 v_1 v_2 v_3 + \partial_{1133} v_1 v_1 v_3 v_3 \ +

\partial_{1211} v_1 v_2 v_1 v_1 + \partial_{1221} v_1 v_2 v_2 v_1 + \partial_{1231} v_1 v_2 v_3 v_1 \ +

\partial_{1212} v_1 v_2 v_1 v_2 + \partial_{1222} v_1 v_2 v_2 v_2 + \partial_{1232} v_1 v_2 v_3 v_2 \ +

\partial_{1213} v_1 v_2 v_1 v_3 + \partial_{1223} v_1 v_2 v_2 v_3 + \partial_{1233} v_1 v_2 v_3 v_3 \ +

\partial_{1311} v_1 v_3 v_1 v_1 + \partial_{1321} v_1 v_3 v_2 v_1 + \partial_{1331} v_1 v_3 v_3 v_1 \ +

\partial_{1312} v_1 v_3 v_1 v_2 + \partial_{1322} v_1 v_3 v_2 v_2 + \partial_{1332} v_1 v_3 v_3 v_2 \ +

\partial_{1313} v_1 v_3 v_1 v_3 + \partial_{1323} v_1 v_3 v_2 v_3 + \partial_{1333} v_1 v_3 v_3 v_3 \ +

\partial_{2111} v_2 v_1 v_1 v_1 + \partial_{2121} v_2 v_1 v_2 v_1 + \partial_{2131} v_2 v_1 v_3 v_1 \ +

\partial_{2112} v_2 v_1 v_1 v_2 + \partial_{2122} v_2 v_1 v_2 v_2 + \partial_{2132} v_2 v_1 v_3 v_2 \ +

\partial_{2113} v_2 v_1 v_1 v_3 + \partial_{2123} v_2 v_1 v_2 v_3 + \partial_{2133} v_2 v_1 v_3 v_3 \ +

\partial_{2211} v_2 v_2 v_1 v_1 + \partial_{2221} v_2 v_2 v_2 v_1 + \partial_{2231} v_2 v_2 v_3 v_1 \ +

\partial_{2212} v_2 v_2 v_1 v_2 + \partial_{2222} v_2 v_2 v_2 v_2 + \partial_{2232} v_2 v_2 v_3 v_2 \ +

\partial_{2213} v_2 v_2 v_1 v_3 + \partial_{2223} v_2 v_2 v_2 v_3 + \partial_{2233} v_2 v_2 v_3 v_3 \ +

\partial_{2311} v_2 v_3 v_1 v_1 + \partial_{2321} v_2 v_3 v_2 v_1 + \partial_{2331} v_2 v_3 v_3 v_1 \ +

\partial_{2312} v_2 v_3 v_1 v_2 + \partial_{2322} v_2 v_3 v_2 v_2 + \partial_{2332} v_2 v_3 v_3 v_2 \ +

\partial_{2313} v_2 v_3 v_1 v_3 + \partial_{2323} v_2 v_3 v_2 v_3 + \partial_{2333} v_2 v_3 v_3 v_3 \ +

\partial_{3111} v_3 v_1 v_1 v_1 + \partial_{3121} v_3 v_1 v_2 v_1 + \partial_{3131} v_3 v_1 v_3 v_1 \ +

\partial_{3112} v_3 v_1 v_1 v_2 + \partial_{3122} v_3 v_1 v_2 v_2 + \partial_{3132} v_3 v_1 v_3 v_2 \ +

\partial_{3113} v_3 v_1 v_1 v_3 + \partial_{3123} v_3 v_1 v_2 v_3 + \partial_{3133} v_3 v_1 v_3 v_3 \ +

\partial_{3211} v_3 v_2 v_1 v_1 + \partial_{3221} v_3 v_2 v_2 v_1 + \partial_{3231} v_3 v_2 v_3 v_1 \ +

\partial_{3212} v_3 v_2 v_1 v_2 + \partial_{3222} v_3 v_2 v_2 v_2 + \partial_{3232} v_3 v_2 v_3 v_2 \ +

\partial_{3213} v_3 v_2 v_1 v_3 + \partial_{3223} v_3 v_2 v_2 v_3 + \partial_{3233} v_3 v_2 v_3 v_3 \ +

\partial_{3311} v_3 v_3 v_1 v_1 + \partial_{3321} v_3 v_3 v_2 v_1 + \partial_{3331} v_3 v_3 v_3 v_1 \ +

\partial_{3312} v_3 v_3 v_1 v_2 + \partial_{3322} v_3 v_3 v_2 v_2 + \partial_{3332} v_3 v_3 v_3 v_2 \ +

\partial_{3313} v_3 v_3 v_1 v_3 + \partial_{3323} v_3 v_3 v_2 v_3 + \partial_{3333} v_3 v_3 v_3 v_3 \bigg]

Again due to equality of cross-partials involving the same index numbers, this reduces to

\frac{1}{4!} \bigg[ \partial_{1111} v_1^4 + \partial_{2222} v_2^4 + \partial_{3333} v_3^4 + 4 \partial_{1112} v_1^3 v_2 + 4 \partial_{1113} v_1^3 v_3 +

4 \partial_{2223} v_2^3 v_3 + 4 \partial_{1222} v_1 v_2^3 + 4 \partial_{1333} v_1 v_3^3 + 4 \partial_{2333} v_2 v_3^3 +

6 \partial_{1122} v_1^2 v_2^2 + 6 \partial_{1133} v_1^2 v_3^2 + 6 \partial_{2233} v_2^2 v_3^2 + 12 \partial_{1123} v_1^2 v_2 v_3 + 12 \partial_{1223} v_1 v_2^2 v_3 + 12 \partial_{1233} v_1 v_2 v_3^2 \bigg]

Note that the coefficient of each of the terms is the multinomial coefficient

\binom{k}{k_1, k_2, \ldots, k_m} = \frac{k!}{k_1! k_2! \dots k_m!}

where k_1 + k_2 + \cdots + k_m = k. This gives the number of ordered arrangements of k objects in which there are k_1 objects of type 1, k_2 objects of type 2, \ldots, k_m objects of type m.

In our case we have k = 4 and m = 3. Thus, the coefficient of \partial_{1122} v_1^2 v_2^2 for example is 6, because there are two 1s, two 2s and zero 3s, and setting k_1 = 2, k_2 = 2, k_3 = 0 we get

\binom{k}{k_1, k_2, k_3} = \frac{k!}{k_1! k_2! k_3!} = \frac{4!}{2! 2! 0!} = \frac{24}{4} = 6

Similarly, in the case of \partial_{1123} v_1^2 v_2 v_3 the coefficient is 12 because k_1 = 2, k_2 = 1, k_3 = 1, so

\binom{k}{k_1, k_2, k_3} = \frac{4!}{2! 1! 1!} = \frac{24}{2} = 12

Note also that within the square brackets in the reduced expression above there are 15 different `types’ of terms. Why 15? The answer has to do with the concept of partitions from number theory. A partition of a positive integer is a way of writing that integer as a sum of positive integers. Two sums that differ only in the order of the integers are considered to be the same partition. Now, the number 4 has five partitions, namely

4

3 + 1

2 + 2

2 + 1 + 1

1 + 1 + 1 + 1

In the multinomial coefficients in the square brackets in the reduced expression above we are trying to obtain the sum k = 4 in all possible ways by assigning values to k_1, k_2 and k_3. There are 3 ways of achieving this with the partition 4 (one example is k_1 = 4, k_2 = 0, k_3 = 0); there are 6 ways of achieving this with the partition 3 + 1 (one example is k_1 = 3, k_2 = 1, k_3 = 0); there are 3 ways of achieving this with the partition 2 + 2 (one example is k_1 = 2, k_2 = 2, k_3 = 0); there are 3 ways of achieving this with the partition 2 + 1 + 1 (one example is k_1 = 2, k_2 =1, k_3 = 1); and finally it is impossible to use the partition 1 + 1 + 1 + 1 since we are only summing three terms. Thus there are 15 ways to achieve the sum k = 4 by applying the partitions of 4 to the values of k_1, k_2 and k_3, which explains why there are 15 terms in the square brackets in the reduced expression above.

Can we use these insights to quickly work out the fourth-order term in the Taylor expansion of the quadvariate function f : \mathbb{R}^4 \rightarrow \mathbb{R}? Since m = 4 and n = 4, this fourth-order term will be a sum of 4^4 = 256 terms. Can we work out how many terms there will be in the reduced expression in which equivalent cross-partials are grouped together? Yes we can, using the partitions of 4 again. In the multinomial coefficients in the square brackets in the reduced expression we will be trying to obtain the sum k = 4 in all possible ways by assigning values to k_1, k_2, k_3 and k_4. There are 4 ways of achieving this with the partition 4 (one example is k_1 = 4, k_2 = 0, k_3 = 0, k_4 = 0); there are 12 ways of achieving this with the partition 3 + 1 (one example is k_1 = 3, k_2 = 1, k_3 = 0, k_4 = 0); there are 6 ways of achieving this with the partition 2 + 2 (one example is k_1 = 2, k_2 = 2, k_3 = 0, k_4 = 0); there are 12 ways of achieving this with the partition 2 + 1 + 1 (one example is k_1 = 2, k_2 =1, k_3 = 1, k_4 = 0); and finally there is exactly one way to use the partition 1 + 1 + 1 + 1, namely k_1 = 1, k_2 = 1, k_3 = 1, k_4 = 1. Thus there are 35 ways to achieve the sum k = 4 by applying the partitions of 4 to the values of k_1, k_2, k_3 and k_4, so there should be 35 different types of terms in the reduced expression. These will be

\frac{1}{4!} \bigg[\partial_{1111} v_1^4 + \partial_{2222} v_2^4 + \partial_{3333} v_3^4 + \partial_{4444} v_4^4 \ +

4 \partial_{1112} v_1^3 v_2 + 4 \partial_{1222} v_1 v_2^3 + 4 \partial_{2223} v_2^3 v_3 \ +

4 \partial_{3444} v_3 v_4^3 + 4 \partial_{1113} v_1^3 v_3 + 4 \partial_{1333} v_1 v_3^3 \ +

4 \partial_{2333} v_2 v_3^3 + 4 \partial_{2224} v_2^3 v_4 + 4 \partial_{1114} v_1^3 v_4 \ +

4 \partial_{1444} v_1 v_4^3 + 4 \partial_{3334} v_3^3 v_4 + 4 \partial_{2444} v_2 v_4^3 \ +

6 \partial_{1122} v_1^2 v_2^2 + 6 \partial_{2233} v_2^2 v_3^2 + 6 \partial_{1133} v_1^2 v_3^2 \ +

6 \partial_{2244} v_2^2 v_4^2 + 6 \partial_{1144} v_1^2 v_4^2 + 6 \partial_{3344} v_3^2 v_4^2 \ +

12 \partial_{1123} v_1^2 v_2 v_3 + 12 \partial_{1124} v_1^2 v_2 v_4 + 12 \partial_{1134} v_1^2 v_3 v_4 \ +

12 \partial_{1223} v_1 v_2^2 v_3 + 12 \partial_{1224} v_1 v_2^2 v_4 + 12 \partial_{1334} v_1 v_3^2 v_4 \ +

12 \partial_{1344} v_1 v_3 v_4^2 + 12 \partial_{1244} v_1 v_2 v_4^2 + 12 \partial_{1233} v_1 v_2 v_3^2 \ +

12 \partial_{2334} v_2 v_3^2 v_4 + 12 \partial_{2234} v_2^2 v_3 v_4 + 12 \partial_{2344} v_2 v_3 v_4^2 \ +

24 \partial_{1234} v_1 v_2 v_3 v_4 \bigg]

The coefficient on each of these terms is the relevant multinomial coefficient, and note that these coefficients add up to 256. The following are examples of how I calculated the multinomial coefficients in this quadvariate case:

For the term involving \partial_{1112} we have k_1 = 3, k_2 = 1, k_3 = 0, k_4 = 0, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{3! 1! 0! 0!} = \frac{24}{6} = 4

For the term involving \partial_{1122} we have k_1 = 2, k_2 = 2, k_3 = 0, k_4 = 0, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{2! 2! 0! 0!} = \frac{24}{4} = 6

For the term involving \partial_{1123} we have k_1 = 2, k_2 = 1, k_3 = 1, k_4 = 0, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{2! 1! 1! 0!} = \frac{24}{2} = 12

Finally, for the term involving \partial_{1234} we have k_1 = 1, k_2 = 1, k_3 = 1, k_4 = 1, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{1! 1! 1! 1!} = \frac{24}{1} = 24

A study of Euler’s untranslated paper `De integratione aequationum differentialium’

UntitledThe following are some translations and annotations I made while studying a previously unstranslated paper in Latin by Leonhard Euler (1707-1783) entitled De integratione aequationum differentialium (On the integration of differential equations), 1763. (Eneström index number E269, obtained from the Euler archive). This contains, among other things, some interesting treatments of linear first-order equations as well as nonlinear first-order equations such as Bernoulli’s equation and Riccati’s equation. I was curious to see what Euler had to say about these, particularly Riccati’s equation. (Jacopo Francesco Riccati (1676-1754) was a Venetian mathematician who introduced an important class of nonlinear first-order differential equations which now bear his name, and which Euler investigated in greater depth).

To begin with, I will translate the first paragraph of the introduction to the paper as it gives a nice overview of the central idea as well as giving a flavour of Euler’s writing style:

1.

I consider herein differential equations of first order, involving only two variables, which one may represent in the general form M dx + N dy = 0, where M and N denote any functions of a pair of variables x and y. It is demonstrated that such equations always imply a certain relationship between the variables x and y, such that for any given value of one of them, the values for the other can be determined. Since moreover this definite relationship between the two variables is to be found through integration, the integral equation, regarded in its greatest generality, will accommodate a new constant quantity which provided that it depends completely on our choice enables as it were an infinite number of integral equations to be encompassed, which should all be equally compatible with the differential equation. 

Mathematical comments: What Euler is essentially saying in this first paragraph is that he is going to consider methods for integrating first-order differential equations of the form M(x, y) dx + N(x, y) dy = 0 in order to obtain corresponding relationships between the variables x and y which he calls `integral equations’ (aequationes integrales) of the form V(x, y) = Const. The term on the right-hand side is an arbitrary constant and no matter what value is assigned to this constant, the differential of this `integral equation’ will always be of the same form M dx + N dy = 0. \square

After some further introductory comments, the rest of Euler’s long paper consists of a multitude of problems, theorems, corollaries and examples concerning differential equations of this type. The first problem that Euler considers is important because it more or less sets the scene for the rest of the paper:

Problem 1.

7. Given that the differential equation M dx + N dy = 0 is such that \big(\frac{d M}{d y} \big) = \big(\frac{d N}{d x} \big), find its integral equation.   

Solution.

If it is to be that \big(\frac{d M}{d y} \big) = \big(\frac{d N}{d x} \big), then a function must exist defined over a pair of variables x and y, which when it is differentiated yields M dx + N dy. Let V be this function, and given that dV = M dx + N dy, M dx will be the differential of V when only the variable x is considered, and N dy will be its differential when only the variable y is considered. From here therefore V will be found, if either M dx is integrated with y regarded as a constant, or N dy is integrated with x regarded as a constant: and in this way the task is reduced to the integration of a differential formula involving a single variable, which might be achieved either by algebraic methods, or else quadrature of curves might be needed. But since it is the case that the quantity V can be found in two ways, and one integration treats as a constant any function whatever of y, while the other treats as a constant any function whatever of x, so it may then be that both V = \int M dx + Y and V = \int N dy + X hold, and it is always permissible to define these functions Y of y, and X of x, such that \int M dx + Y = \int N dy + X, and this is in all cases readily guaranteed. When this is done it is evident that since the quantity V is to be the integral of the formula M dx + N dy, the complete integral equation of the proposed equation M dx + N dy = 0 will then be V = Const., as this involves a constant which is dependent on our choice. 

Mathematical comments: The first thing to observe here is that Euler uses the notation \big(\frac{d M}{d y} \big) and \big(\frac{d N}{d x} \big) to represent the partial derivatives \frac{\partial M}{\partial y} and \frac{\partial N}{\partial x}. At the start of the solution he says that if \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} then a function V(x, y) must exist which yields M dx + N dy when it is totally differentiated. The reason for this immediately becomes clear if we totally differentiate V(x, y) to get dV = \frac{\partial V}{\partial x} dx + \frac{\partial V}{\partial y} dy. We see that M = \frac{\partial V}{\partial x} and N = \frac{\partial V}{\partial y} so the condition \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} simply amounts to saying that the cross-partials of V are equal, i.e., \frac{\partial^2 V}{\partial x \partial y} = \frac{\partial^2 V}{\partial y \partial x}. This must be true for any well-behaved function. So if the condition \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} holds it will always be possible to find a function V whose total differential is M dx + N dy. Conversely, if it is not true that \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x}, there can be no function whose total differential is M dx + N dy. Euler deals with this latter case in a subsequent problem (see below). Since M = \frac{\partial V}{\partial x} we see that V = \int M dx + Y where Y is any function of y. Similarly, since N = \frac{\partial V}{\partial y} we have that V = \int N dy + X where X is any function of x. Equating these two expressions for V we can then work out what the functions X and Y must be in order to make \int M dx + Y and \int N dy + X mutually compatible. This is essentially the strategy that Euler is advocating here. \square

Euler provides a number of examples to illustrate the above procedure. The following is the first one:

Example 1.

12. Integrate this differential equation:

2axy dx + axx dy - y^3 dx - 3xyy dy = 0

The equation compared with the form M dx + N dy = 0 implies

M = 2axy - y^3 and N = axx - 3xyy.

Then the first thing to be considered is whether in this case the problem can be solved ? to which end we inquire as to the values of:

\big(\frac{d M}{d y}\big) = 2ax - 3yy, and \big(\frac{d N}{d y}\big) = 2ax - 3yy,

which since they are equal, the suggested method must necessarily succeed. Then it will be discovered, with y taken to be constant:

\int M dx = axxy - y^3 x + Y

which is of such a form that if the differential is taken, treating x as a constant, it will yield:

axx dy - 3yyx dy + dY = N dy,

and upon replacing N with its value axx - 3xyy, the result will be dY = 0, from which we get Y = 0, or Y = const. Hence we will have the required integral equation:

axxy - y^3 x = Const.

In the next extract, Euler introduces an integrating factor method to deal with situations in which the condition \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} does not hold:

Theorem.

16. If for the differential equation M dx + N dy = 0 it is not the case that \big(\frac{d M}{d y}\big) = \big(\frac{d N}{d x}\big), a multiplier is always available by means of which multiplication of the formula M dx + N dy will make it become integrable. 

Proof.

Since it is not the case that \big(\frac{d M}{d y}\big) = \big(\frac{d N}{d x}\big), also the formula M dx + N dy will not be integrable, or rather a function of x and y does not exist whose differential is M dx + N dy. In truth however it is not the formula M dx + N dy, but the equation M dx + N dy = 0, whose integral is being sought; and since this equation still stands if it is multiplied by any function L of x and y, so that it may be written L M dx + L N dy = 0, it is demonstrated that it is always possible to find a function L of this kind, such that the formula L M dx + L N dy = 0 will be integrable. For this to come about it will be necessary that:

\big(\frac{d LM}{d y}\big) = \big(\frac{d LN}{d x}\big)

or if it is supposed that dL = P dx + Q dy, then since \big(\frac{d L}{d y}\big) = Q, and \big(\frac{d L}{d x}\big) = P, the function L should be constructed so that:

L \big(\frac{d M}{d y}\big) + M Q = L \big(\frac{d N}{d x}\big) + NP

But it is apparent that this condition is sufficient to define a function L, which when it multiplies the formula M dx + N dy will make it integrable. 

Mathematical comments: What Euler is doing here is deriving a procedure for finding an integrating factor L(x, y) which will make LM dx + LN d y = 0 integrable. If this is integrable there will be a function V(x, y) whose total differential is LM dx + LN d y. But the total differential of V is d V = \frac{\partial V}{\partial x} dx + \frac{\partial V}{\partial y} dy from which we see that

\frac{\partial V}{\partial x} = LM and \frac{\partial V}{\partial y} = LN

Therefore Euler’s condition

\frac{\partial LM}{\partial y} = \frac{\partial LN}{\partial x}

again amounts to saying that the cross-partials of V must be equal. Expanding Euler’s condition we get

L \frac{\partial M}{\partial y} + M \frac{\partial L}{\partial y} = L \frac{\partial N}{\partial x} + N \frac{\partial L}{\partial x}

which when it is assumed that dL = P dx + Q dy we can write as

L \frac{\partial M}{\partial y} + MQ = L \frac{\partial N}{\partial x} + NP

This is the equation at the end of Euler’s proof and we can see that it is nothing more than a statement of the equality of the cross-partials of V. Euler does not explicitly take the final step, saying simply that it is evident from this that the required function L can be defined, but what he means is that L can now be defined in terms of the functions M, N, P and Q as

L = \frac{NP - MQ}{\frac{\partial M}{\partial y} - \frac{\partial N}{\partial x}}

When M dx + N dy is multiplied through by a function L defined in this way, the result will be a differential equation that is integrable. \square

I will now jump ahead in Euler’s paper to an application of the above procedure in the familiar case of a linear first-order differential equation:

Problem 4.

34. If the proposed differential equation is to be 

P dx + Q y dx + R dy = 0

where P, Q and R denote any functions whatever of x, and moreover the other variable y does not have more than one dimension, find the multiplier which makes it integrable. 

Solution.

Comparison of this equation with the form M dx + N dy implies M = P + Qy and N = R, and so we get 

\big(\frac{d M}{d y}\big) = Q and \big(\frac{d N}{d x}\big) = \frac{d R}{d x}

If now L is to be set up as the multiplier in question, let dL = p dx + q dy, and then the equation ought to satisfy

\frac{Np - Mq}{L} = Q - \frac{d R}{d x} = \frac{Rp - (P + Qy)q}{L}

Now since Q - \frac{d R}{d x} is to be a function only of x, L will likewise have to be accepted as a function only of x, so that q = 0 and dL = p dx; whence it will be the case that:

Q - \frac{d R}{d x} = \frac{Rp}{L}, or rather Q dx - dR = \frac{R dL}{L}

and therefore \frac{d L}{L} = \frac{Q dx}{R} - \frac{d R}{R}. Hence upon integration will be obtained lL = \int \frac{Q dx}{R} - lR, and taking e to be the number whose hyperbolic logarithm equals unity, it turns out that 

L = \frac{1}{R} e^{\int \frac{Q dx}{R}}

Then as a result of this discovered multiplier the integral equation will be:

\int \frac{P dx}{R} e^{\int \frac{Q dx}{R}} + y e^{\int \frac{Q dx}{R}} = Const.

Mathematical comments: In view of the previous theorem introducing the integrating factor approach using Euler’s idiosyncratic notation, Euler’s explanation of the solution to this problem is largely self-explanatory. Perhaps only the final step needs clarification. Having found the multiplier L we can multiply the original differential equation by it to get

L P dx + L Q y dx + L R dy = 0

or

\frac{P dx}{R} e^{\int \frac{Q dx}{R}} + \frac{Q dx}{R} y e^{\int \frac{Q dx}{R}} + e^{\int \frac{Q dx}{R}} dy = 0

or

\frac{P dx}{R} e^{\int \frac{Q dx}{R}} + d \big(y e^{\int \frac{Q dx}{R}}\big) = 0

This can now be integrated directly to get the final equation obtained by Euler. Although straightforward, it is interesting to compare Euler’s approach here with the type of argument usually given in modern treatments of linear first-order differential equations. In modern treatments the differential equation would usually be given as

\frac{d y}{d x} + Py = Q

We would then seek an integrating factor L(x) which would enable us to write

\frac{d}{dx}(y L) = QL

and this could then be integrated directly to find y. The unknown function L is found by first expanding the left-hand side to get

L \frac{d y}{d x} + y \frac{d L}{d x} = QL

Dividing through by L we get

\frac{d y}{d x} + \big(\frac{d L/d x}{L}\big)y = Q

Comparing this with the original equation we see that

\frac{1}{L} \frac{d L}{d x} = P

and so

L = e^{\int P dx}

Putting this into \frac{d}{dx}(y L) = QL and integrating directly we then get the required solution

y e^{\int P dx} = \int Q e^{\int P dx} dx + Const.

which is equivalent to the one obtained by Euler. \square

In the next problem Euler considers a nonlinear first-order differential equation which today is known as Bernoulli’s equation:

Problem 5.

37. If the proposed differential equation is to be: 

P y^n dx + Q y dx + R dy = 0

where P, Q and R denote any functions whatever of x, find the multiplier which makes it integrable. 

Solution.

Then it will be that M = P y^n + Qy and N = R, and hence

\big(\frac{d M}{d y}\big) = n P y^{n - 1} + Q, and \big(\frac{d N}{d x}\big) = \frac{d R}{d x}

Therefore assuming that the multiplier in question is L and dL = p dx + q dy, based on our previous findings it will be the case that:

 \frac{Rp - P y^n q - Q y q}{L} = n P y^{n - 1} + Q - \frac{d R}{d x}

If it is imagined that L = S y^m, where S is a function only of x, it will be the case that p = \frac{y^m d S}{d x}, and q = m S y^{m - 1}, so that substituting these values will yield:

\frac{R d S}{S d x} - m P y^{n - 1} - m Q = n P y^{n - 1} + Q - \frac{d R}{d x}

In order for this equation to hold it needs to take m = -n, so it will become

\frac{R d S}{S d x} = (1 - n) Q - \frac{d R}{d x} , or \frac{d S}{S} = \frac{(1 - n) Q dx}{R} - \frac{d R}{R}

Then since integration will yield S = \frac{1}{R} e^{(1 - n) \int \frac{Q dx}{R}}, the required multiplier, with m = -n, will be

L = \frac{y^{-n}}{R} e^{(1 - n) \int \frac{Q dx}{R}}

and the integral equation will be

\frac{y^{1 - n}}{1 - n} e^{(1 - n) \int \frac{Q dx}{R}} + \int \frac{P dx}{R} e^{(1 - n) \int \frac{Q dx}{R}} =  Const.

Mathematical comments: Again in view of the previous material, Euler’s solution to this problem is largely self-explanatory. To get the final equation we multiply the original differential equation by the multiplier to get

L P y^n dx + L Q y dx + L R dy = 0

or

\frac{P dx}{R} e^{(1 - n) \int \frac{Q dx}{R}} + \frac{Q dx}{R} y^{1 - n} e^{(1 - n) \int \frac{Q dx}{R}} + y^{-n} e^{(1 - n) \int \frac{Q dx}{R}} dy = 0

or

\frac{P dx}{R} e^{(1 - n) \int \frac{Q dx}{R}} + d \big(\frac{y^{1 - n}}{1 - n} e^{(1 - n) \int \frac{Q dx}{R}}\big) = 0

and this last equation can now be integrated directly. As with the previous problem, it is interesting to compare Euler’s approach to solving this problem with the way it would be done in modern treatments. Prior to Euler, the equation in this problem was also studied by two Bernoulli brothers, James Bernoulli (1654-1705) and John Bernoulli (1667-1748), as well as by Gottfried Wilhelm Leibniz (1646-1716). Ironically, the modern approach is essentially the one developed by John Bernoulli before Euler tackled it (John Bernoulli was Euler’s tutor). In modern times the equation would usually be written as

\frac{d y}{d x} + P y = Q y^n

and John Bernoulli’s approach was to make the change of variable z = y^{1 - n}. Then

\frac{d z}{d x} = \frac{1 - n}{y^n} \frac{d y}{d x}

and we can rewrite the original equation using this new variable as

\frac{d z}{d x} + (1 - n) P z = (1 - n) Q

This is now a first-order linear equation of the type solved in the previous problem. Following the approach discussed there, we seek a multiplier L which enables us to write the equation as

\frac{d}{dx} (z L) = (1 - n) Q L

Expanding the left-hand side and comparing with the original equation we find that

\frac{1}{L} \frac{d L}{d x} = (1 - n) P

so the multiplier is

L = e^{(1 - n) \int P dx}

Then we can write

\frac{d}{dx}\big(z e^{(1 - n) \int P dx}\big) = (1 - n) Q e^{(1 - n) \int P dx}

This can be integrated directly to get

z e^{(1 - n) \int P dx} = (1 - n) \int Q e^{(1 - n) \int P dx} dx + Const.

or

y^{1-n} e^{(1 - n) \int P dx} = (1 - n) \int Q e^{(1 - n) \int P dx} dx + Const.

which is equivalent to Euler’s solution. \square

The final extract from Euler’s paper that I want to include here is a problem dealing with a nonlinear differential equation now known as Riccati’s equation. The equation was apparently first referred to as `Riccati’s equation’ in 1770 (only a few years after Euler wrote this paper) by the mathematician Jean-Baptiste le Rond d’Alembert (1717-1783):

Problem 9. 

56. For this proposed differential equation:

dy + P y dx + Q y y dx + R dx = 0

where P, Q and R are functions only of x, if it is determined that this equation satisfies y = v where v is a function of x, find the multipliers which make this equation integrable. 

Solution.

Since this equation is satisfied by the value y = v, it will be the case that

dv + P v dx + Q v v dx + R dx = 0;

if one then puts y = v + \frac{1}{z}, one will obtain

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

or:

dz - (P + 2Qv)z dx - Qdx = 0

which is rendered integrable by means of the multiplier

e^{-\int(P + 2Qv)dx}.

This multiplier will thus fit the proposed equation when it itself is multiplied by zz. When therefore it is the case that z = \frac{1}{y - v}, the multiplier that will render the proposed equation integrable will be

\frac{1}{(y - v)^2} \ e^{-\int(P + 2Qv)dx}

For the sake of brevity let e^{-\int(P + 2Qv)dx} = S. Since the integral of the equation dz - (P + 2Qv)z dx - Qdx = 0 is

Sz - \int QS dx = Const.

all the required multipliers will be encompassed by this form:

\frac{S}{(y - v)^2} funct. \big(\frac{S}{y - v} - \int QS dx \big)

where by hypothesis v is a known function of x, and also S = e^{-\int(P + 2Qv)dx}

Mathematical comments: In this problem Euler has developed a method for solving Riccati’s equation when a particular integral is known. The basic idea is that if the particular solution v(x) is known, then the change of variables y = v + \frac{1}{z} gives a simple linear first-order equation for z of the form dz - (P + 2Qv)z dx - Qdx = 0. This is because substituting y = v + \frac{1}{z} into Riccati’s equation we get

dy + Py dx + Qyy dx + Rdx =

\big(dv - \frac{1}{zz} dz\big) + \big(Pv dx + \frac{P dx}{z}\big) + \big(Qvv dx + \frac{2Qvdx}{z} + \frac{Qdx}{zz}\big) + Rdx =0

which reduces to

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

\iff

dz - (P + 2Qv)z dx - Qdx = 0

The integrating factor which makes this simple linear first-order differential equation integrable can be found straightforwardly using the method of Problem 4, and is e^{-\int(P + 2Qv)dx}. Then the integrating factor that makes the equation

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

integrable will be

zz e^{-\int(P + 2Qv)dx} = \frac{1}{(y - v)^2} \ e^{-\int(P + 2Qv)dx}

To clarify the last part of Euler’s solution, note that if we multiply through the equation dz - (P + 2Qv)z dx - Qdx = 0 by S = e^{-\int(P + 2Qv)dx} we get

Sdz - S(P + 2Qv)z dx - SQdx = 0

\iff

d(Sz) - QSdx = 0

This last equation can be integrated directly to give

Sz - \int QS dx = Const.

or

\frac{S}{y - v} - \int QS dx = Const.

Euler makes the point that in general the integrating factor for all cases of Riccati’s equation in which a particular solution v(x) is known will be of the form

\frac{S}{(y - v)^2} f\big(\frac{S}{y - v} - \int QS dx\big)

where f is some unspecified function. This works because the argument of the unspecified function f is an arbitrary constant, so df = 0. Thus, for example, starting from

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

we can multiply through by -\frac{Sf}{(y - v)^2} instead of just -\frac{S}{(y - v)^2} to get

Sf dz - Sf (P + 2Qv) z dx - SfQ dx = 0

and since df = 0 we can add the term zS df to get

Sf dz - Sf (P + 2Qv) z dx + zSdf - SfQ dx = 0

\iff

d\big(Sfz\big) - QSf dx = 0

and this can be integrated directly to get

Sfz - \int QSf dx = Const. \square

On the electrodynamical part of Einstein’s paper ‘Zur Elektrodynamik bewegter Körper’

Albert Einstein (1879-1955) introduced his special theory of relativity in a 1905 paper Zur Elektrodynamik bewegter Körper (On the Electrodynamics of Moving Bodies, Annalen der Physik. 17 (10): 891–921). The thoughtfulness and confidence of the 26 year-old Einstein are impressive and what is also striking is that the paper contains no references to other research publications. The only person Einstein acknowledges is his best friend Michele Angelo Besso (1873-1955) with the now famous concluding sentence: Zum Schlusse bemerke ich, daß mir beim Arbeiten an dem hier behandelten Probleme mein Freund und Kollege M. Besso treu zur Seite stand und daß ich demselben manche wertvolle Anregung verdanke. (`In conclusion I wish to say that in working at the problem here dealt with I have had the loyal assistance of my friend and colleague M. Besso, and that I am indebted to him for several valuable suggestions.’) The two remained close until they died a month apart fifty years later.

While reading through Einstein’s work I could not help tinkering with his electrodynamical equations and comparing them to the modern equations of classical and relativistic electrodynamics. I want to record some of my jottings about this in the present note, particularly in relation to two key sections of his paper:

§3 Theorie der Koordinaten- und Zeittransformation… (Theory of the Transformation of Coordinates and Times…)

and

§6 Transformation der Maxwell-Hertzschen Gleichungen für den leeren Raum. (Transformation of the Maxwell-Hertz Equations for Empty Space.)

What Einstein envisages throughout these two sections is a pair of inertial reference frames K and k which are in what we now call `standard configuration’. That is, frame k is moving in the positive x-direction relative to frame K with speed v such that all the Cartesian coordinate axes in the two frames remain parallel and such that the origins and axes of the Cartesian coordinate systems in K and k coincide perfectly at some initial time point. Under these conditions the coordinates of an event using the coordinate system in the k-frame can be expressed in terms of the coordinates of the same event using the coordinate system in the K-frame by a set of transformation equations which Einstein derives in §3 of his paper, and which are now known as the Lorentz transformation equations. Einstein expresses these equations as follows on page 902 of the published paper:

(`It follows from this relation and the one previously found that \phi(v) = 1, so that the transformation equations which have been found become: \cdots.’) Note that Einstein uses the symbol V to denote the speed of light whereas we now use the symbol c.

Shortly after Einstein published this in 1905, Herman Minkowski (1864-1909) realised that the special theory of relativity could be better understood by positing the existence of a four-dimensional spacetime. In Minkowski spacetime we have four-vectors consisting of a time coordinate, ct, and three spatial coordinates, x, y and z. Note that the time coordinate takes the form ct in order to make it have the same units as the spatial coordinates (i.e., units of length). The full coordinate transformations between Einstein’s two inertial frames K and k in standard configuration would be obtained in Minkowski spacetime as

\begin{bmatrix} ct^{\prime}\\ \ \\x^{\prime} \\ \ \\ y^{\prime} \\ \ \\ z^{\prime} \end{bmatrix} = \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \  0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \  1 & \  \ 0 \\ \ \\ 0 & 0 & \  0 & \  \  1 \end{bmatrix} \begin{bmatrix} ct \\ \ \\x \\ \ \\ y \\ \ \\ z \end{bmatrix}

where, using the same notation as Einstein,

\beta = \frac{1}{\sqrt{1 - \big(\frac{v}{c}\big)^2}}

The coefficient matrix is actually a rank-2 tensor of type (1, 1) usually denoted by \Lambda^{\mu}_{\hphantom{\mu} \nu}. It is the application of this matrix to a four-vector in the K-frame that would produce the required Lorentz transformation to a corresponding (primed) four-vector in the k-frame when K and k are in standard configuration.

At the start of §6 on page 907 of the published paper Einstein writes the following:

(‘Let the Maxwell-Hertz equations for empty space hold for the stationary system K, so that we have: \ldots where (X, Y, Z) denotes the vector of the electric force, and (L, M, N) that of the magnetic force.’) The key point that Einstein is trying to make in §6 is that when the coordinate transformations he derived in §3 are applied to electromagnetic processes satisfying the above Maxwell-Hertz equations, it is found that the vectors (X, Y, Z) and (L, M, N) themselves satisfy transformation equations of the form

(page 909 of Einstein’s published paper). Here the primed letters are the components of the vectors with respect to the coordinate system in the k-frame and the unprimed letters are the components with respect to the coordinate system in the K-frame. These equations show that the components of (X, Y, Z) and (L, M, N) do not all remain unchanged when we switch from one inertial reference frame to another. In the standard configuration scenario, only the X and L components remain unchanged. In contrast, looking at the Y^{\prime} equation, for example, we see that an event which from the point of view of the k-frame would be regarded as being due solely to an `electric force’ Y^{\prime} would be regarded from the point of view of the K-frame as being due to a combination of an electric force Y and a magnetic force N. Thus, Einstein writes on page 910 that die elektrischen und magnetischen Kräfte keine von dem Bewegungszustande des Koordinatensystems unabhängige Existenz besitzen. (`electric and magnetic forces do not exist independently of the state of motion of the system of coordinates.’)

Not surprisingly, the terminology and notation that Einstein uses in 1905 seem rather archaic and obscure from a modern perspective and I could not help jotting down modern interpretations of what he was saying as I was reading his paper. To begin with, the `Maxwell-Hertz equations for empty space’ that Einstein refers to at the start can be viewed as arising from a simple scenario in which there is a changing charge and current distribution within a certain region of space, but we are considering the fields produced by this source of radiation in the free space outside the region. In this free space outside the region the charge and current densities (denoted by \rho and \vec{J} respectively) are everywhere zero and the differential form of the Maxwell equations in SI units reduce to

\nabla \cdot \vec{E} = 0

\nabla \cdot \vec{B} = 0

\mu_0 \epsilon_0 \frac{\partial \vec{E}}{\partial t} = \nabla \times \vec{B}

\frac{\partial \vec{B}}{\partial t} = - \nabla \times \vec{E}

where the parameters \epsilon_0 and \mu_0 and the speed of light c are related by the equation

c^2 = \frac{1}{\mu_0 \epsilon_0}

The vector \vec{E} = (E_x, E_y, E_z) denotes the electric field which is defined at any given point as the electric force per unit charge on an infinitesimally small positive test charge placed at that point. This corresponds to the vector (X, Y, Z) in Einstein’s paper which he called den Vektor der elektrischen (‘the vector of the electric force’). The vector \vec{B} = (B_x, B_y, B_z) denotes the magnetic field and this corresponds to the vector (L, M, N) in Einstein’s paper. A moving charge creates a magnetic field in all of space and this magnetic field exerts a force on any other moving charge. (Charges at rest do not produce magnetic fields nor do magnetic fields exert forces on them).

In his paper Einstein actually employs an alternative formulation of the above Maxwell equations in which quantities are measured in Gaussian units rather than SI units. This involves defining

\epsilon_0 = \frac{1}{4 \pi c}

(and therefore \mu_0 = \frac{4 \pi}{c}) and also rescaling the electric field vector as

\vec{E} = c^{-1} \vec{E}_{SI}

where \vec{E}_{SI} denotes the electric field vector expressed in SI units. When these changes are substituted into the above Maxwell equations, the equations become

\nabla \cdot \vec{E} = 0

\nabla \cdot \vec{B} = 0

\frac{1}{c} \frac{\partial \vec{E}}{\partial t} = \nabla \times \vec{B}

\frac{1}{c} \frac{\partial \vec{B}}{\partial t} = - \nabla \times \vec{E}

The last two of these are the equations appearing at the start of §6 of Einstein’s paper, with V \equiv c there. Incidentally, these are also the equations that are responsible for the emergence of electromagnetic waves, as can easily be demonstrated by taking the curl of both sides of the equations to get

\frac{1}{c} \frac{\partial (\nabla \times \vec{E})}{\partial t} = \nabla \times (\nabla \times \vec{B})

\frac{1}{c} \frac{\partial (\nabla \times \vec{B})}{\partial t} = - \nabla \times (\nabla \times \vec{E})

On the left-hand sides I have taken the curl operator into the partial derivative which is permissible since curl does not involve time. We can then replace the curl inside each partial derivative by the corresponding term in Maxwell’s equations. Using a standard identity in vector calculus the right-hand-sides become

\nabla \times (\nabla \times \vec{B}) = \nabla (\nabla \cdot \vec{B}) - \nabla^{2} \vec{B}

\nabla \times (\nabla \times \vec{E}) = \nabla (\nabla \cdot \vec{E}) - \nabla^{2} \vec{E}

Since the divergence terms are zero in free space we are left only with the Laplacian terms on the right-hand sides. Putting the left and right-hand sides together we get the electromagnetic wave equations

\nabla^2 \vec{B} = \frac{1}{c^2} \frac{\partial^2 \vec{B}}{\partial t^2}

\nabla^2 \vec{E} = \frac{1}{c^2} \frac{\partial^2 \vec{E}}{\partial t^2}

With regard to Einstein’s demonstration that the electric and magnetic fields get mixed together under a Lorentz transformation, we would show this now via a Lorentz transformation of a rank-2 electromagnetic field tensor of the form

F^{\mu \nu} = \begin{bmatrix} 0 & \frac{1}{c} E_x & \frac{1}{c} E_y & \frac{1}{c} E_z \\ \ \\ -\frac{1}{c} E_x & 0 & B_z & -B_y \\ \ \\-\frac{1}{c} E_y & -B_z & 0 & B_x \\ \ \\ -\frac{1}{c}E_z & B_y & -B_x & 0 \end{bmatrix}

Note that we are using SI units here, which is why the coefficient \frac{1}{c} appears in front of the electric field components. This coefficient would disappear if we were using Gaussian units. To apply the Lorentz transformation to the electromagnetic field tensor we observe that since F^{\mu \nu} is a rank-2 tensor of type (2, 0) it transforms according to

F^{\prime \  \alpha \beta} = \frac{\partial x^{\prime \alpha}}{\partial x^{\mu}} \frac{\partial x^{\prime \beta}}{\partial x^{\nu}} F^{\mu \nu}

Since

\Lambda^{\alpha}_{\hphantom{\alpha} \mu} \equiv \frac{\partial x^{\prime \alpha}}{\partial x^{\mu}}

we obtain the required Lorentz transformation of the electromagnetic field tensor as

F^{\prime \  \alpha \beta} = \Lambda^{\alpha}_{\hphantom{\alpha} \mu} \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}

But \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu} involves multiplying corresponding terms in each row of the matrices \Lambda^{\beta}_{\hphantom{\beta} \nu} and F^{\mu \nu} which is the same as the matrix multiplication

[F^{\mu \nu}] [\Lambda^{\beta}_{\hphantom{\beta} \nu}]^T

where the T denotes the matrix transpose. (Note that the transformation matrix for frames in standard configuration is symmetric, so it is unaffected by transposition). Applying \Lambda^{\alpha}_{\hphantom{\alpha} \mu} on the left then amounts to multiplying the above matrix product on the left by the matrix [\Lambda^{\alpha}_{\hphantom{\alpha} \mu}]. Thus, we are able to compute the required Lorentz transformation of the electromagnetic field tensor as a matrix product. We get

F^{\prime \  \alpha \beta} = \Lambda^{\alpha}_{\hphantom{\alpha} \mu} \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}

=  [\Lambda^{\alpha}_{\hphantom{\alpha} \mu}] [F^{\mu \nu}] [\Lambda^{\beta}_{\hphantom{\beta} \nu}]^T

= \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \  0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \  1 & \  \ 0 \\ \ \\ 0 & 0 & \  0 & \  \  1 \end{bmatrix} \begin{bmatrix} 0 & \frac{1}{c} E_x & \frac{1}{c} E_y & \frac{1}{c} E_z \\ \ \\ -\frac{1}{c} E_x & 0 & B_z & -B_y \\ \ \\-\frac{1}{c} E_y & -B_z & 0 & B_x \\ \ \\ -\frac{1}{c}E_z & B_y & -B_x & 0 \end{bmatrix} \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \  0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \  1 & \  \ 0 \\ \ \\ 0 & 0 & \  0 & \  \  1 \end{bmatrix}

= \begin{bmatrix} 0 & \frac{E_x}{c} & \beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg) & \beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg)\\ \ \\ -\frac{E_x}{c} & 0 & \beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg) & -\beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg) \\ \ \\ -\beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg) & -\beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg) & 0 & B_x \\ \ \\ -\beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg) & \beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg) & -B_x & 0 \end{bmatrix}

\equiv \begin{bmatrix} 0 & \frac{E_x}{c}^{\prime} & \frac{E_y}{c}^{\prime} & \frac{E_z}{c}^{\prime} \\ \ \\ -\frac{E_x}{c}^{\prime} & 0 & B_z^{\prime} & -B_y^{\prime} \\ \ \\-\frac{E_y}{c}^{\prime} & -B_z^{\prime} & 0 & B_x^{\prime} \\ \ \\ -\frac{E_z}{c}^{\prime} & B_y^{\prime} & -B_x^{\prime} & 0 \end{bmatrix}

Comparing the entries in the last two matrices we get exactly the same relations as Einstein did on page 909 of his paper, except that we are using SI units here so the electric field components are divided by c:

\frac{E_x}{c}^{\prime} = \frac{E_x}{c}

\frac{E_y}{c}^{\prime} = \beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg)

\frac{E_z}{c}^{\prime} = \beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg)

B_x^{\prime} = B_x

B_y^{\prime} = \beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg)

B_z^{\prime} = \beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg)

 

A note on reverse engineering the Navier-Stokes equations

eulernavierstokesThe Navier-Stokes equations are the fundamental equations of fluid mechanics, analogous to, say, Maxwell’s equations in the case of electromagnetism. They are important for applications in science and engineering but they are difficult to solve analytically so real-life applications rely almost exclusively on computer-aided methods. In fact, the equations are still not yet fully understood mathematically. Some basic questions about the existence and nature of possible solutions remain unanswered. Because of their importance, the Clay Mathematics Institute has offered a prize of one million dollars to anyone who can clarify some specific fundamental questions about them (see the CMI web page for details).

The equations are named after Claude-Louis Navier (1785-1836) and George Gabriel Stokes (1819-1903) who worked on their development independently. Some other leading contemporary mathematicians were also involved and I was intrigued to learn that all of these mathematicians used Euler’s equations of motion as a starting point in their derivations of the Navier-Stokes equations. Euler’s equations had been derived much earlier in 1757 by the great mathematician Leonhard Euler (1707-1783). In tensor notation, using the Einstein summation convention, the Euler equations take the form

\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)

I explored a derivation of these equations from Newton’s second law in a previous post. To better understand how the two sets of equations are related, it occurred to me that it might be an interesting exercise, and probably not too difficult, to try to `reverse engineer’ the Navier-Stokes equations using Euler’s equations as a starting point. I want to briefly record my exploration of this idea in the present note.

I am interested in the incompressible fluid version of the Navier-Stokes equations which can be written in tensor form, again using the Einstein summation convention, as

\rho g^i - \frac{\partial p}{\partial x^i} + \mu \frac{\partial}{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big) = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)

where \mu > 0 is a viscosity coefficient specific to the fluid in question. Putting the Euler equations and the Navier-Stokes equations side by side like this makes it clear that the key difference between them is the addition of the viscosity-related terms on the left-hand side of the Navier-Stokes equations. To clarify things a bit more it is helpful to see the equations laid out fully. The Euler equations constitute a three-equation system which looks like this:

\rho g^1 - \frac{\partial p}{\partial x^1} = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)

\rho g^2 - \frac{\partial p}{\partial x^2} = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)

\rho g^3 - \frac{\partial p}{\partial x^3} = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)

The Navier-Stokes equations also constitute a three-equation system which looks like this:

\rho g^1 - \frac{\partial p}{\partial x^1} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)

\rho g^2 - \frac{\partial p}{\partial x^2} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)

\rho g^3 - \frac{\partial p}{\partial x^3} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)

Looking at the pattern of the indices appearing in the viscosity-related terms on the left-hand side of the Navier-Stokes equations, one is immediately reminded of the indices identifying the positions of the elements of a matrix and we can in fact arrange the viscosity-related terms in matrix form as

\begin{bmatrix} \frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\end{bmatrix}

From the point of view of reverse engineering the Navier-Stokes equations this is highly suggestive because in deriving the Euler equations we encountered one key matrix, the rank-2 stress tensor \tau^{ij}. Could the above matrix be related to the 3 \times 3 stress tensor? The answer is yes, and in fact we have that

\begin{bmatrix} \frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\end{bmatrix} = \begin{bmatrix} \frac{\partial \tau^{11}}{\partial x^1} & \frac{\partial \tau^{12}}{\partial x^2} & \frac{\partial \tau^{13}}{\partial x^3}\\ \ \\ \frac{\partial \tau^{21}}{\partial x^1} & \frac{\partial \tau^{22}}{\partial x^2} & \frac{\partial \tau^{23}}{\partial x^3}\\ \ \\ \frac{\partial \tau^{31}}{\partial x^1} & \frac{\partial \tau^{32}}{\partial x^2} & \frac{\partial \tau^{33}}{\partial x^3} \end{bmatrix}

In deriving the Euler equations what we did was to assume that the fluid was friction-free so that there were no shear forces acting on any of the faces of a given fluid volume element. This means that the fluid was being modelled as being inviscid, i.e., having no viscosity. As a result, we kept only the diagonal elements of the stress tensor and re-interpreted these as pressures. Approximating the net pressure in each direction on a fluid volume element using a Taylor series expansion we were then led to include only the first-order partials of the diagonal terms of the stress tensor (interpreted as the first-order partials of pressure) in the Euler equations.

In the case of the Navier-Stokes equations we are no longer assuming that the fluid is inviscid so we are including all the first-order partials of the stress tensor in the equations. Thus, the Navier-Stokes equations could be written as

\rho g^1 - \frac{\partial p}{\partial x^1} + \mu \big(\frac{\partial \tau^{11}}{\partial x^1} + \frac{\partial \tau^{12}}{\partial x^2} + \frac{\partial \tau^{13}}{\partial x^3}\big) = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)

\rho g^2 - \frac{\partial p}{\partial x^2} + \mu \big(\frac{\partial \tau^{21}}{\partial x^1} + \frac{\partial \tau^{22}}{\partial x^2} + \frac{\partial \tau^{23}}{\partial x^3}\big) = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)

\rho g^3 - \frac{\partial p}{\partial x^3} + \mu \big(\frac{\partial \tau^{31}}{\partial x^1} + \frac{\partial \tau^{32}}{\partial x^2} + \frac{\partial \tau^{33}}{\partial x^3}\big) = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)

Detailed consideration of the forces acting on an infinitesimal volume element in the case of a viscous fluid lead to expressions for the stress tensor components as functions of the first-order partials of the components of the velocity vector of the form

\tau^{ij} = \tau^{ji} = \mu \big(\frac{\partial v^i}{\partial x^j} + \frac{\partial v^j}{\partial x^i}\big)

The first-order partials of these are then of the form

\frac{\partial \tau^{ij}}{\partial x^j} = \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big) + \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^j}{\partial x^i}\big)

But note that with incompressible flow the mass density of the fluid is a constant so the divergence of the velocity vanishes (this is implied directly by the continuity equation with constant mass density). Therefore the second term on the right-hand side is zero since

\frac{\partial }{\partial x^j}\big(\frac{\partial v^j}{\partial x^i}\big) = \frac{\partial }{\partial x^i}\big(\frac{\partial v^j}{\partial x^j}\big)

= \frac{\partial }{\partial x^i}\big(\frac{\partial v^1}{\partial x^1}\big) + \frac{\partial }{\partial x^i}\big(\frac{\partial v^2}{\partial x^2}\big) + \frac{\partial }{\partial x^i}\big(\frac{\partial v^3}{\partial x^3}\big) = \frac{\partial }{\partial x^i} \text{div}(\vec{v}) = 0

Thus we are left with

\frac{\partial \tau^{ij}}{\partial x^j} = \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big)

which are the extra terms appearing on the left-hand side of the Navier-Stokes equations.

 

A fluid-mechanical visualisation of the quantum-mechanical continuity equation

The concept of a probability current is useful in quantum mechanics for analysing quantum scattering and tunnelling phenomena, among other things. However, I have noticed that the same rather abstract and non-visual approach to introducing probability currents is repeated almost verbatim in every textbook (also see, e.g., this Wikipedia article). The standard approach essentially involves defining a probability current from the outset as

\vec{j} = \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)

and then using Schrödinger’s equation to show that this satisfies a fluid-like continuity equation of the form

\frac{\partial \rho}{\partial t} + \nabla \cdot \vec{j} = 0

with

\rho \equiv \Psi^{*} \Psi

In the present note I want to briefly explore a more intuitive and visual approach involving a model of the actual flow of a `probability fluid’. I want to begin with a fluid-mechanical model and then obtain the standard expression for the quantum-mechanical continuity equation from this, rather than starting out with an abstract definition of the probability current and then showing that this satisfies a continuity equation. The essential problem one faces when trying to do this is that although in classical mechanics the position \vec{r}(t) of a point particle and its velocity \vec{v}(t) = d\vec{r}(t)/dt are well defined, this is not the case in conventional quantum mechanics. Quantum mechanics is done probabilistically, treating a particle as a wave packet such that the square of the amplitude of the corresponding wave function acts as a probability density which can be used to measure the probability that the particle will occupy a particular region of space at a particular time. It is not possible to say definitively where a particular particle will be at a particular time in quantum mechanics, which makes it difficult to apply the conventional deterministic equations of fluid mechanics.

A huge specialist literature on quantum hydrodynamics has in fact arisen which tries to circumvent this problem in a number of ways. A standard reference is Wyatt, R. E., 2005, Quantum Dynamics with Trajectories: Introduction to Quantum Hydrodynamics (Springer). The route that a large part of this literature has taken is intriguing because it is based on Bohmian mechanics, an approach to quantum mechanics developed by David Bohm in 1952 which is regarded by most mainstream physicists today as unconventional. The key feature of the Bohmian mechanics approach is that classical-like particle trajectories are possible. Using this approach one can obtain Newtonian-like equations of motion analogous to those in conventional fluid mechanics and this is how this particular literature seems to have chosen to treat quantum particle trajectories in a fluid-like way. Attempts have also been made to introduce mathematically equivalent approaches, but defined within conventional quantum mechanics (see, e.g., Brandt, S. et al, 1998, Quantile motion and tunneling, Physics Letters A, Volume 249, Issue 4, pp. 265-270).

In the present note I am not looking to solve any elaborate problems so I will simply consider a free quantum wave packet which is not acted upon by any forces and try to visualise probability currents and the quantum continuity equation in a fluid-like way by using the momentum vector operator \hat{\vec{p}} to characterise the velocity of the particle. I will then show that the probability current obtained in this fluid-mechanical model is the same as the one defined abstractly in textbooks.

In quantum mechanics, calculations are done using operators to represent observables. Every possible observable that one might be interested in for the purposes of experiment has a corresponding operator which the mathematics of quantum mechanics can work on to produce predictions. The key operator for the purposes of the present note is the momentum vector operator

\hat{\vec{p}} = -i \hbar \nabla

which is the quantum mechanical analogue of the classical momentum vector

\vec{p} = m \vec{v}

The key idea for the present note is to regard the velocity vector of the quantum particle as being represented by the operator

\frac{\hat{\vec{p}}}{m}

by analogy with the classical velocity vector which can be obtained as

\vec{v} = \frac{\vec{p}}{m}

We will imagine the total probability mass

\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \Psi \text{d}V = 1

as a fluid in steady flow throughout the whole of space and obeying mass conservation. The fact that the flow is steady reflects the fact that there are no forces acting on the quantum particle in this model, so we must have

\frac{\partial }{\partial t}\big[\frac{\hat{\vec{p}}}{m}\big] = 0

The velocity can vary from point to point in the probability fluid but at any given point it cannot be varying over time.

In a classical fluid we have a mass density per unit volume \rho and we regard the velocity vector \vec{v} as a volumetric flow rate per unit area, i.e., the volume of fluid that would pass through a unit area per unit time. Then \rho \vec{v} is the mass flow rate per unit area, i.e., the mass of fluid that would pass through a unit area per unit time. In quantum mechanics we can regard the probability mass density per unit volume \rho \equiv \Psi^{*} \Psi as analogous to the mass density of a classical fluid. We can interpret \frac{\hat{\vec{p}}}{m} as the volumetric flow rate per unit area, i.e., the volume of probability fluid that would pass through a unit area per unit time. When doing probability calculations with quantum mechanical operators we usually `sandwich’ the operator between \Psi^{*} and \Psi, so following that approach here we can define the probability current density as

\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi

This is to be interpreted as the probability mass flow rate per unit area, i.e., the amount of probability mass that would pass through a unit area per unit time, analogous to \vec{j} = \rho \vec{v} in the classical case. To see how close the analogy is, suppose the quantum wave function is that of a plane wave

\Psi(x, y, z, t) = A\mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}

Then

\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi = A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} \nabla A \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}

= A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} A \ i \ \vec{k} \ \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}

= A^2 \frac{\hbar \vec{k}}{m}

= \rho \vec{v}

which looks just like the mass flow rate in the classical case with \rho = \Psi^{*} \Psi and \vec{v} \equiv \frac{\hbar \vec{k}}{m}. Note that in this example the probability current density formula we are using, namely \Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi, turned out to be real-valued. Unfortunately this will not always be the case. Since the probability current vector must always be real-valued, the fluid-mechanical model in the present note will only be applicable in cases when this is true for the formula \Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi.

As in classical fluid mechanics, a continuity equation can now be derived by considering the net outflow of probability mass from an infinitesimal fluid element of volume \mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z.

Considering only the y-component for the moment, we see from the diagram that on the left-hand side we have the probability mass flow rate coming into the volume element through the left-hand face. The mass flow rate coming out of the fluid element through the right-hand face can be approximated using a Taylor series expansion as being equal to the mass flow rate through the left-hand face plus a differential adjustment based on the gradient of the probability current density and the length \mathrm{d} y. The net probability mass flow rate in the y-direction is then obtained by subtracting the left-hand term from the right-hand term to get

\frac{\partial }{\partial y} \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi\big) \mathrm{d}V

Using similar arguments for the x and z-directions, the net mass flow rate out of the fluid element in all three directions is then

\nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) \mathrm{d} V

Now, the probability mass inside the fluid element is \rho \mathrm{d} V where \rho = \Psi^{*} \Psi and if there is a net outflow of probability fluid this mass will be decreasing at the rate

- \frac{\partial \rho}{\partial t} \mathrm{d} V

Equating the two expressions and dividing through by the volume of the fluid element we get the equation of continuity

\frac{\partial \rho}{\partial t} + \nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) = 0

What I want to do now is show that if we work out \Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi we will get the same formula for the probability current as the one usually given in quantum mechanics textbooks. We have

\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi

= - \frac{i \hbar}{m} \Psi^{*} \nabla \Psi

= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi

We now note that since the probability current density must be a real vector, the last two terms above must be real. Therefore they are not affected in any way if we take their complex conjugate. Taking the complex conjugate of the second term in the last equality we get

\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi

= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi + \frac{i \hbar}{2m} \Psi \nabla \Psi^{*}

= \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)

= \vec{j}

This is exactly the expression for the probability current density that appears in textbooks, but rather than introducing it `out of nowhere’ at the beginning, we have obtained it naturally as a result of a fluid-mechanical model.

Calculation of a quantum-mechanical commutator in three dimensions

I needed to work out the commutator [\hat{H}, \hat{\vec{r}} \ ], where

\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}

is the Hamiltonian operator and \hat{\vec{r}} is the 3D position vector operator. It is difficult to find any textbook or online source that explicitly goes through the calculation of this three-dimensional case (in fact, I have not been able to find any) so I am recording my calculation step-by-step in this note.

The commutator [\hat{H}, \hat{\vec{r}} \ ] is a vector operator with components

[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H}

[\hat{H}, \hat{y} \ ] = \hat{H} \ \hat{y} - \hat{y} \ \hat{H}

and

[\hat{H}, \hat{z} \ ] = \hat{H} \ \hat{z} - \hat{z} \ \hat{H}

To evaluate these, note that the momentum operator (in position space) is

\hat{\vec{p}} = -i \ \hbar \nabla

and so we have

\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}

= \frac{1}{2m} \hat{\vec{p}} \cdot \hat{\vec{p}} + \hat{U}

= \frac{1}{2m}(\hat{p}_x^{2} + \hat{p}_y^{2} + \hat{p}_z^{2}) + \hat{U}

Looking at the x-component of [\hat{H}, \hat{\vec{r}} \ ] we therefore have

[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{\hat{p}_x^{2} \hat{x} + \hat{p}_y^{2} \hat{x} + \hat{p}_z^{2} \hat{x}}{2m} + \hat{U} \ \hat{x} - \big(\frac{\hat{x} \hat{p}_x^{2} + \hat{x} \hat{p}_y^{2} + \hat{x} \hat{p}_z^{2}}{2m} + \hat{x} \ \hat{U}\big)

= \frac{1}{2m}([\hat{p}_x^2, \hat{x} \ ] + [\hat{p}_y^2, \hat{x} \ ] + [\hat{p}_z^2, \hat{x} \ ]) + [\hat{U}, \hat{x} \ ]

Since multiplication is commutative we have [\hat{U}, \hat{x} \ ] = 0. I will now show that we also have

[\hat{p}_y^2, \hat{x} \ ] = [\hat{p}_z^2, \hat{x} \ ] = 0

To see this, let us first work out in detail the effect of [\hat{p}_y, \hat{x} \ ] on a wavefunction \Psi. We have

[\hat{p}_y, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial y} + \hat{x }i \ \hbar \frac{\partial \Psi}{\partial y}

= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial y} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial y}

= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} = 0

where the last equality follows from the fact that \hat{x} does not depend on y. Thus, [\hat{p}_y, \hat{x} \ ] = 0.

We can now easily show that [\hat{p}_y^2, \hat{x} \ ] = 0 because using the basic result for commutators that

[AB, C] = A[B, C] + [A, C]B

(easy to prove by writing out the terms in full) we find that

[\hat{p}_y^2, \hat{x} \ ] = \hat{p}_y \ [\hat{p}_y, \hat{x} \ ] + [\hat{p}_y, \hat{x} \ ] \ \hat{p}_y = 0

Identical arguments show that [\hat{p}_z^2, \hat{x} \ ] = 0. Thus, we can conclude that

[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ]

It now only remains to work out [\hat{p}_x^2, \hat{x} \ ] and we can do this by first working out in detail the effect of [\hat{p}_x, \hat{x} \ ] on a wavefunction \Psi (this is of course the `canonical commutation relation’ of quantum mechanics). We have

[\hat{p}_x, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial x} + \hat{x } i \ \hbar \frac{\partial \Psi}{\partial x}

= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial x} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial x}

= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} = - i \ \hbar \Psi

where the last equality follows from the fact that \hat{x} is the same as multiplying by x so its derivative with respect to x equals 1. Thus, [\hat{p}_x, \hat{x} \ ] = - i \ \hbar. Then we find that

[\hat{p}_x^2, \hat{x} \ ] = \hat{p}_x \ [\hat{p}_x, \hat{x} \ ] + [\hat{p}_x, \hat{x} \ ] \ \hat{p}_x = -2 i \ \hbar \hat{p}_x

and we can conclude that

[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ] = - \frac{i \ \hbar}{m} \hat{p}_x

Identical arguments show that

\hat{H} \ \hat{y} - \hat{y} \ \hat{H} = \frac{1}{2m} [\hat{p}_y^2, \hat{y} \ ] = - \frac{i \ \hbar}{m} \hat{p}_y

and

\hat{H} \ \hat{z} - \hat{z} \ \hat{H} = \frac{1}{2m} [\hat{p}_z^2, \hat{z} \ ] = - \frac{i \ \hbar}{m} \hat{p}_z

Thus, we finally reach our desired expression for the Hamiltonian-position commutator in three dimensions:

[\hat{H}, \hat{\vec{r}} \ ] = - \frac{i \ \hbar}{m} \hat{\vec{p}}

As an application of this result we will consider the problem of working out the expectation of position and velocity for a quantum particle. In three-dimensional space the quantum wave function is

\Psi = \Psi(x, y, z, t)

and we obtain the probability density function as

\rho(x, y, z, t) = \Psi^{*} \Psi

The wavefunction \Psi satisfies the time-dependent Schrödinger equation

i \hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\big(\frac{\partial^2 \Psi}{\partial x^2} + \frac{\partial^2 \Psi}{\partial y^2} + \frac{\partial^2 \Psi}{\partial z^2}\big) + U\Psi

where U = U(x, y, z, t) is some potential energy function. We can write the Schrödinger equation in operator form using Dirac notation as

\hat{H} |\Psi \rangle = i \hbar \frac{\partial}{\partial t} |\Psi \rangle

where

\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}

is the Hamiltonian operator (the Hamiltonian form of total energy) and

i \hbar \frac{\partial }{\partial t}

is the total energy operator. Note that the complex conjugate of the wavefunction \Psi^{*} satisfies the Schrödinger equation written in Dirac notation as

\langle \Psi | \hat{H} = - \langle \Psi | i \hbar \frac{\partial}{\partial t}

In quantum mechanics we find the expected position \langle \vec{r}(t) \rangle of the particle by integrating the position operator \hat{\vec{r}} over all space, sandwiched between \Psi^{*} and \Psi. Thus, letting \mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z we have

\langle \vec{r}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \  \hat{\vec{r}} \ \Psi \text{d}V \equiv \langle \Psi \  | \hat{\vec{r}} \ | \Psi \rangle

where in the last term I have switched to using Dirac notation which will be useful shortly. The expected velocity can then be obtained by differentiating this integral with respect to t. We get

\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \  \frac{\partial \Psi}{\partial t} \big] \text{d}V + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \frac{\partial \hat{\vec{r}}}{\partial t} \  \Psi \text{d}V

The second triple integral on the right-hand side is zero because the position operator does not depend on time. The integrand in the first triple integral can be manipulated by using the operator form of the Schrödinger equation and Dirac notation to write

\frac{\partial \Psi}{\partial t} = \frac{\partial}{\partial t} | \Psi \rangle = \frac{1}{i \ \hbar} \  \hat{H} \ | \Psi \rangle

and

\frac{\partial \Psi^{*}}{\partial t} = \langle \Psi | \frac{\partial}{\partial t} = -\frac{1}{i \ \hbar} \  \langle \Psi | \hat{H}

Thus, we have

\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \  \frac{\partial \Psi}{\partial t} \big] \text{d}V

= - \frac{1}{i \ \hbar} \langle \Psi |\hat{H} \  \hat{\vec{r}} \  | \Psi \rangle + \frac{1}{i \ \hbar} \langle \Psi | \hat{\vec{r}} \ \hat{H} | \Psi \rangle

= \frac{i}{\hbar} \langle \Psi | \hat{H} \ \hat{\vec{r}} \ - \hat{\vec{r}} \ \hat{H} \ | \Psi \rangle

= \frac{i}{\hbar} \langle \Psi | \ [\hat{H}, \hat{\vec{r}} \ ] \ | \Psi \rangle

= \frac{1}{m} \langle \Psi | \ \hat{\vec{p}} \ | \Psi \rangle

= \big \langle \frac{\hat{\vec{p}}}{m} \big \rangle

where the last two equalities follow from the fact that [\hat{H}, \hat{\vec{r}} \ ] is the commutator of \hat{H} and \hat{\vec{r}}, which is equal to -\frac{i \hbar}{m} \hat{\vec{p}} as we saw above. Therefore the expected velocity of a quantum particle looks a lot like the velocity of a classical particle (momentum divided by mass). The idea that quantum mechanical expectations exhibit Newtonian-like behaviour is the essence of the Ehrenfest Theorem of quantum mechanics.

Derivation of Euler’s equations of motion for a perfect fluid from Newton’s second law

eulerHaving read a number of highly technical derivations of Euler’s equations of motion for a perfect fluid I feel that the mathematical meanderings tend to obscure the underlying physics. In this note I want to explore the derivation from a more physically intuitive point of view. The dynamics of a fluid element of mass m are governed by Newton’s second law which says that the vector sum of the forces acting on the fluid element is equal to the mass of the element times its acceleration. Thus,

\vec{F} = m \vec{a}

The net force \vec{F} can be decomposed into two distinct types of forces, so-called body forces \vec{W} that act on the entire fluid element (e.g., the fluid element’s weight due to gravity) and stresses \vec{S} such as pressures and shears that act upon the surfaces enclosing the fluid element. For the purposes of deriving the differential form of Euler’s equation we will focus on the net force per unit volume acting on the fluid element, \vec{f}, which we will decompose into a weight per unit volume \vec{w} and a net stress force per unit volume \vec{s}. The weight per unit volume is simply obtained as

\vec{w} = \rho \vec{g}

where

\rho = \frac{m}{V}

is the mass density of the fluid (i.e., mass per unit volume) and \vec{g} is the acceleration due to gravity. In index notation, the equation for the i-th component of the weight per unit volume is

w^i = \rho g^i

The net stress force per unit volume, \vec{s}, is a little more complicated to derive since it involves the rank-2 stress tensor \tau^{ij}. This tensor contains nine components and is usually represented as a 3 \times 3 symmetric matrix. In Cartesian coordinates the components along the main diagonal, namely \tau^{xx}, \tau^{yy} and \tau^{zz}, represent normal stresses, i.e., forces per unit area acting orthogonally to the planes whose normal vector is identified by the first superscript, as indicated in the diagram below. (Note that a stress is a force per unit area, so to convert a stress tensor component \tau^{ij} into a force it would be necessary to multiply it by the area over which it acts).

In the diagram each normal stress is shown as a tension, i.e., a normal stress pointing away from the surface. When a normal stress points towards the surface it acts upon, it is called a pressure.

The off-diagonal components of the stress tensor represent shear stresses, i.e., forces per unit area that point along the sides of the fluid element, parallel to these sides rather than normal to them. These shear stresses are shown in the following diagram.

Shear stresses only arise when there is some kind of friction in the fluid. A perfect fluid is friction-free so there are no shear stresses. Euler’s equation only applies to perfect fluids so for the derivation of the equation we can ignore the off-diagonal components of the stress tensor.

The normal stresses along the main diagonal are usually written as

\tau^{xx} = - p^x

\tau^{yy} = - p^y

\tau^{zz} = - p^z

where p stands for pressure and the negative sign reflects the fact that a pressure points in the opposite direction to a tension.

In a perfect fluid the pressure is isotropic, i.e., the same in all directions, so we have

p^x = p^y = p^z = p

Therefore the stress tensor of a perfect fluid with isotropic pressure reduces to

\tau^{ij} = -p \delta^{ij}

where \delta^{ij} is the Kronecker delta (and may be thought of here as the metric tensor of Cartesian 3-space).

Now suppose we consider the net stress (force per unit area) in the y-direction of an infinitesimal volume element.

The stress on the right-hand face can be approximated using a Taylor series expansion as being equal to the stress on the left plus a differential adjustment based on its gradient and the length dy. If we take the stress on the right to be pointing in the positive direction and the one on the left as pointing in the negative (opposite) direction, the net stress in the y-direction is given by

\tau^{yy} + \frac{\partial \tau^{yy}}{\partial y} dy - \tau^{yy} = \frac{\partial \tau^{yy}}{\partial y} dy

Similarly, the net stresses in the x and z-directions are

\frac{\partial \tau^{xx}}{\partial x} dx

and

\frac{\partial \tau^{zz}}{\partial z} dz

To convert these net stresses to net forces we multiply each one by the area on which it acts. Thus, the net forces on the fluid element (in vector form) are

\big(\frac{\partial \tau^{xx}}{\partial x} dxdydz\big) \vec{i}

\big(\frac{\partial \tau^{yy}}{\partial y} dxdydz\big) \vec{j}

\big(\frac{\partial \tau^{zz}}{\partial z} dxdydz\big) \vec{k}

The total net force on the fluid element is then

\big(\frac{\partial \tau^{xx}}{\partial x} \  \vec{i} + \frac{\partial \tau^{yy}}{\partial y} \  \vec{j} + \frac{\partial \tau^{zz}}{\partial z} \  \vec{k}\big) dxdydz

Switching from tensions to pressures using \tau^{ij} = -p \delta^{ij} and dividing through by the volume dxdydz we finally get the net stress force per unit volume to be

\vec{s} = -\big(\frac{\partial p}{\partial x} \  \vec{i} + \frac{\partial p}{\partial y} \  \vec{j} + \frac{\partial p}{\partial z} \  \vec{k}\big)

In index notation, the equation for the i-th component of this net pressure per unit volume is written as

s^i = -\frac{\partial p}{\partial x^i}

We have now completed the analysis of the net force on the left-hand side of Newton’s second law.

On the right-hand side of Newton’s second law we have mass times acceleration, where acceleration is the change in velocity with time. To obtain an expression for this we observe that the velocity of a fluid element may change for two different reasons. First, the velocity field may vary over time at each point in space. Second, the velocity may vary from point to point in space (at any given time). Thus, we consider the velocity field to be a function of the time coordinate as well as the three spatial coordinates, so

\vec{v} = \vec{v}(t, x, y, z) = v^x(t, x, y, z) \ \vec{i} + v^y(t, x, y, z) \ \vec{j} + v^z(t, x, y, z) \ \vec{k}

Considering the i-th component of this velocity field, the total differential is

dv^i = \frac{\partial v^i}{\partial t} \ dt + \frac{\partial v^i}{\partial x} \ dx + \frac{\partial v^i}{\partial y} \ dy + \frac{\partial v^i}{\partial z} \ dz

so the total derivative with respect to time is

\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} \ dt + v^x \frac{\partial v^i}{\partial x} + v^y \frac{\partial v^i}{\partial y} + v^z \frac{\partial v^i}{\partial z}

where I have used

v^x = \frac{dx}{dt}

v^y = \frac{dy}{dt}

v^z = \frac{dz}{dt}

We can write this more compactly using the Einstein summation convention as

\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}

This is then the i-th component of the acceleration vector on the right-hand side of Newton’s second law. In component form, therefore, we can write mass times acceleration per unit volume for the fluid element as

\rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)

This completes the analysis of the mass times acceleration term on the right-hand side of Newton’s second law.

In per-unit-volume form, Newton’s second law for a fluid element is

\vec{w} + \vec{s} = \rho \vec{a}

and writing this in the component forms derived above we get the standard form of Euler’s equations of motion for a perfect fluid:

\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)