Overview of the Lie theory of rotations

A Lie group is a group which is also a smooth differentiable manifold. Every Lie group has an associated tangent space called a Lie algebra. As a vector space, the Lie algebra is often easier to study than the associated Lie group and can reveal most of what we need to know about the group. This is one of the general motivations for Lie theory. A table of some common Lie groups and their associated Lie algebras can be found here. All matrix groups are Lie groups. An example of a matrix Lie group is the D-dimensional rotation group SO(D). This group is linked to a set of D(D-1)/2 antisymmetric matrices which form the associated Lie algebra, usually denoted by \mathfrak{so}(D). Like all Lie algebras corresponding to Lie groups, the Lie algebra \mathfrak{so}(D) is characterised by a Lie bracket operation which here takes the form of commutation relations between the above-mentioned antisymmetric matrices, satisfying the formula

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

The link between \mathfrak{so}(D) and SO(D) is provided by the matrix exponential map \mathrm{exp}: \mathfrak{so}(D) \rightarrow SO(D) in the sense that each point in the Lie algebra is mapped to a corresponding point in the Lie group by matrix exponentiation. Furthermore, the exponential map defines parametric paths passing through the identity element in the Lie group. The tangent vectors obtained by differentiating these parametric paths and evaluating the derivatives at the identity are the elements of the Lie algebra, showing that the Lie algebra is the tangent space of the associated Lie group manifold.

In the rest of this note I will unpack some aspects of the above brief summary without going too much into highly technical details. The Lie theory of rotations is based on a simple symmetry/invariance consideration, namely that rotations leave the scalar products of vectors invariant. In particular, they leave the lengths of vectors invariant. The Lie theory approach is much more easily generalisable to higher dimensions than the elementary trigonometric approach using the familiar rotation matrices in two and three dimensions. Instead of obtaining the familiar trigonometric rotation matrices by analysing the trigonometric effects of rotations, we will see below that they arise in Lie theory from the exponential map linking the Lie algebra \mathfrak{so}(D) to the rotation group SO(D), in a kind of matrix analogue of Euler’s formula e^{ix} = \mathrm{cos}x +  i \mathrm{sin}x.

Begin by considering rotations in D-dimensional Euclidean space as being implemented by multiplying vectors by a D \times D rotation matrix R(\vec{\theta}) which is a continuous function of some parameter vector \vec{\theta} such that R(\vec{0}) = I. In Lie theory we regard these rotations as being infinitesimally small, in the sense that they move us away from the identity by an infinitesimally small amount. If \mathrm{d}\vec{x} is the column vector of coordinate differentials, then the rotation embodied in R(\vec{\theta}) is implemented as

\mathrm{d}\vec{x}^{\ \prime} = R \mathrm{d}\vec{x}

Since we require lengths to remain unchanged after rotation, we have

\mathrm{d}\vec{x}^{\ \prime \ T} \mathrm{d}\vec{x}^{\ \prime} = \mathrm{d}\vec{x}^{\ T}R^T R \mathrm{d}\vec{x} = \mathrm{d}\vec{x}^{\ T}\mathrm{d}\vec{x}

which implies

R^T R = I

In other words, the matrix R must be orthogonal. Furthermore, since the determinant of a product is the product of the determinants, and the determinant of a transpose is the same as the original determinant, we can write

\mathrm{det}(R^T R) = (\mathrm{det}R)^2 = \mathrm{det}(I) = 1

Therefore we must have

\mathrm{det}(R) = \pm 1

But we can exclude the case \mathrm{det}(R) = -1 because the set of orthogonal matrices with negative determinants produces reflections. For example, the orthogonal matrix

\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}

has determinant -1 and results in a reflection in the x-axis when applied to a vector. Here we are only interested in rotations, which we can now define as having orthogonal transformation matrices R such that \mathrm{det}(R) = 1. Matrices which have unit determinant are called special, so focusing purely on rotations means that we are dealing exclusively with the set of special orthogonal matrices of dimension D, denoted by SO(D).

It is straightforward to verify that SO(D) constitutes a group with the operation of matrix multiplication. It is closed, has an identity element I, each element R \in SO(D) has an inverse (since the determinant is nonzero), and matrix multiplication is associative. Note that this means a rotation matrix times a rotation matrix must give another rotation matrix, so this is another property R(\vec{\theta}) needs to satisfy.

The fact that SO(D) is also a differentiable manifold, and therefore a Lie group, follows in a technical way (which I will not delve into here) from the fact that SO(D) is a closed subgroup of the set of all invertible D \times D real matrices, usually denoted by GL(D, \mathbb{R}), and this itself is a manifold of dimension D^2. The latter fact is demonstrated easily by noting that for M \in GL(D, \mathbb{R}), the determinant function M \mapsto \mathrm{det}(M) is continuous, and GL(D, \mathbb{R}) is the inverse image under this function of the open set \mathbb{R} - \{0\}. Thus, GL(D, \mathbb{R}) is itself an open subset in the D^2-dimensional linear space of all the D \times D real matrices, and thus a manifold of dimension D^2. The matrix Lie group SO(D) is a manifold of dimension \frac{D(D-1)}{2}, not D^2. One way to appreciate this is to observe that the condition R^T R = I for every R \in SO(D) means that you only need to specify \frac{D(D-1)}{2} off-diagonal elements to specify each R. In other words, there are D^2 elements in each R but the condition R^T R = I means that there are \frac{D(D+1)}{2} equations linking them, so the number of `free’ elements in each R \in SO(D) is only D^2 - \frac{D(D+1)}{2} = \frac{D(D-1)}{2}. We will see shortly that \frac{D(D-1)}{2} is also the dimension of \mathfrak{so}(D), which must be the case given that \mathfrak{so}(D) is to be the tangent space of the manifold SO(D) (the dimension of a manifold is the dimension of its tangent space).

If we now Taylor-expand R(\vec{\theta}) to first order about \vec{\theta} = \vec{0} we get

R(\vec{\theta}) \approx I + A

where A is an infinitesimal matrix of order \vec{\theta} and we will (for now) ignore terms like A^2, A^3, \ldots which are of second and higher order in \vec{\theta}. Now substituting R = I + A into R^T R = I we get

(I + A)^T (I + A) = I + A^T + A = I

\implies

A^T = -A

Thus, the matrix A must be antisymmetric. In fact, A will be a linear combination of some elementary antisymmetric basis matrices which play a crucial role in the theory, so we will explore this more. Since a sum of antisymmetric matrices is antisymmetric, and a scalar product of an antisymmetric matrix is antisymmetric, the set of all D \times D antisymmetric matrices is a vector space. This vector space has a basis provided by some elementary antisymmetric matrices containing only two non-zero elements each, the two non-zero elements in each matrix appearing in corresponding positions either side of the main diagonal and having opposite signs (this is what makes the matrices antisymmetric). Since there are \frac{D(D-1)}{2} distinct pairs of possible off-diagonal positions for these two non-zero elements, the basis has dimension \frac{D(D-1)}{2} and, as will be seen shortly, this vector space in fact turns out to be the Lie algebra \mathfrak{so}(D). The basis matrices will be written as J_{(mn)} where m and n identify the pair of corresponding off-diagonal positions in which the two non-zero elements will appear. We will let m run through the numbers 1, 2, \ldots, D in order, and with each pair m and n fixed, the element in the w-th row and k-th column of each matrix J_{(mn)} is then given by the formula

(J_{(mn)})_{wk} = \delta_{mw} \delta_{nk} - \delta_{mk} \delta_{nw}

To clarify this, we will consider the antisymmetric basis matrices for D = 2, D = 3 and D = 4. In the case D = 2 we have \frac{D(D-1)}{2} = 1 so there is a single antisymmetric matrix. Setting m = 1, n = 2, we get (J_{(12)})_{12} = 1 - 0 = 1 and (J_{(12)})_{21} = 0 - 1 = -1 so the antisymmetric matrix is

J_{(12)} = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}

In the case D = 3 we have \frac{D(D-1)}{2} = 3 antisymmetric basis matrices corresponding to the three possible pairs of off-diagonal positions for the two non-zero elements in each matrix. Following the same approach as in the previous case, these can be written as

J_{(12)} = \begin{bmatrix} 0 & 1 & 0\\ -1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}

J_{(23)} = \begin{bmatrix} 0 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{bmatrix}

J_{(31)} = \begin{bmatrix} 0 & 0 & -1\\ 0 & 0 & 0 \\ 1 & 0 & 0 \end{bmatrix}

Finally, in the case D = 4 we have \frac{D(D-1)}{2} = 6 antisymmetric basis matrices corresponding to the six possible pairs of off-diagonal positions for the two non-zero elements in each matrix. These can be written as

J_{(12)} = \begin{bmatrix} 0 & 1 & 0 & 0\\ -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(23)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(31)} = \begin{bmatrix} 0 & 0 & -1 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(41)} = \begin{bmatrix} 0 & 0 & 0 & -1\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \end{bmatrix}

J_{(42)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1\\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}

J_{(43)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{bmatrix}

So in the case of a general infinitesimal rotation in D-dimensional space of the form R(\vec{\theta}) \approx I + A, the antisymmetric matrix A will be a linear combination of the \frac{D(D-1)}{2} antisymmetric basis matrices J_{(mn)} of the form

A = \sum_m \sum_n \theta_{(mn)} J_{(mn)}

But note that using the standard matrix exponential series we have

e^{A} = I + A + \frac{1}{2}A^2 + \cdots \approx I + A

This suggests

R(\vec{\theta}) \approx e^A

and in fact this relationship between rotations and the exponentials of antisymmetric matrices turns out to be exact, not just an approximation. To see this, observe that A and A^{\ T} commute since A^{\ T}A = AA^{\ T} = -A^2. This means that

(e^{A})^{\ T} e^A = e^{A^{\ T}} e^A = e^{A^{\ T} + A} = e^0 = I

(note that in matrix exponentiation e^Ae^B = e^{A+B} only if A and B commute – see below). Since the diagonal elements of an antisymmetric matrix are always zero, we also have

\mathrm{det}(e^A) = e^{tr(A)} = e^0 = 1

Thus, e^A is both special and orthogonal, so it must be an element of SO(D). Conversely, suppose e^A \in SO(D). Then we must have

(e^{A})^{\ T} e^A = I

\iff

e^{A^{\ T}}e^A = I

\iff

e^{A^{\ T}} = e^{-A}

\implies

A^{\ T} = -A

so A is antisymmetric.

So we have a tight link between \mathfrak{so}(D) and SO(D) via matrix exponentiation. We can do a couple of things with this. First, for any real parameter t \in \mathbb{R} and antisymmetric basis matrix J_{(mn)}, we have R(t) \equiv e^{t J_{(mn)}} \in SO(D) and this defines a parametric path through SO(D) which passes through its identity element at t = 0. Differentiating with respect to t and evaluating the derivative at t = 0 we find that

R^{\ \prime}(0) = J_{(mn)}

which indicates that the antisymmetric basis matrices J_{(mn)} are tangent vectors of the manifold SO(D) at the identity, and that the set of \frac{D(D-1)}{2} antisymmetric basis matrices form the tangent space of SO(D). Another thing we can do with the matrix exponential map is quickly recover the elementary rotation matrix in the case D = 2. Noting that  J_{(12)}^2 = -I and separating the exponential series into even and odd terms in the usual way we find that

R(\theta) = e^{\theta J_{(12)}} = \mathrm{cos}\theta I + \mathrm{sin}\theta J_{(12)} = \begin{bmatrix} \mathrm{cos}\theta & \mathrm{sin}\theta \\ -\mathrm{sin}\theta & \mathrm{cos}\theta \end{bmatrix}

where the single real number \theta here is the angle of rotation. This is the matrix analogue of Euler’s formula e^{ix} = \mathrm{cos}x +  i \mathrm{sin}x that was mentioned earlier.

To further elucidate how the antisymmetric basis matrices J_{(mn)} form a Lie algebra which is closely tied to the matrix Lie group SO(D), we will show that the commutation relation between them is closed (i.e., that the commutator of two antisymmetric basis matrices is itself antisymmetric), and that these commutators play a crucial role in ensuring the closure of the group SO(D) (i.e., in ensuring that a rotation multiplied by a rotation produces another rotation). First, suppose that A and B are two distinct antisymmetric matrices. Then since the transpose of a product is the product of the transposes in reverse order we can write

([A, B])^{\ T} = (AB - BA)^{\ T} = (AB)^{\ T} - (BA)^{\ T}

= B^{\ T} A^{\ T}  - A^{\ T} B^{\ T} = BA - AB = - [A, B]

This shows that the commutator of two antisymmetric matrices is itself antisymmetric, so the commutator can be written as a linear combination of the antisymmetric basis matrices J_{(mn)}. Furthermore, since we can write A = \sum_{m, n} \theta_{(mn)} J_{(mn)} and B = \sum_{p, q} \theta_{(pq)}^{\ \prime} J_{(pq)}, we have

[A, B] = \sum_{m, n, p, q} \theta_{(mn)} \theta_{(pq)}^{\ \prime}[J_{(mn)}, J_{(pq)}]

so every commutator between antisymmetric matrices can be written in terms of the commutators [J_{(mn)}, J_{(pq)}] of the antisymmetric basis matrices. Next, suppose we exponentiate the antisymmetric matrices A and B to obtain the rotations e^A and e^B. Since SO(D) is closed, it must be the case that

e^A e^B = e^C

where e^C is another rotation and therefore C is an antisymmetric matrix. To see the role of the commutator between antisymmetric matrices in ensuring this, we will expand both sides. For the left-hand side we get

e^A e^B = (I + A + \frac{1}{2}A^2 + \cdots)(I + B + \frac{1}{2}B^2 + \cdots)

= I + A + B + \frac{1}{2}A^2 + \frac{1}{2}B^2 + AB + \cdots

= I + A + B + \frac{1}{2}(A^2 + AB + BA + B^2) + \frac{1}{2}[A, B] + \cdots

= I + A + B + \frac{1}{2}(A + B)^2 + \frac{1}{2}[A, B] + \cdots

= I + A + B + \frac{1}{2}[A, B] + \cdots

For the right-hand side we get

e^C =  I + C + \frac{1}{2}C^2 + \cdots

Equating the two expansions we get

C = A + B + \frac{1}{2}[A, B] + \cdots

where the remaining terms on the right-hand side are of second and higher order in A, B and C. A result known as the Baker-Campbell-Hausdorff formula shows that the remaining terms on the right-hand side of C are in fact all nested commutators of A and B. The series for C with a few additional terms expressed in this way is

C = A + B + \frac{1}{2}[A, B]

+ \frac{1}{12}\big([A, [A, B]] + [B, [B, A]]\big)

- \frac{1}{24}[B, [A, [A, B]]]

- \frac{1}{720}\big([B, [B, [B, [B, A]]]] + [A, [A, [A, [A, B]]]]\big) + \cdots

This shows that e^A e^B \neq e^{A + B} unless A and B commute, since only in this case do all the commutator terms in the series for C vanish. Since the commutator of two antisymmetric matrices is itself antisymmetric, this result also shows that C is an antisymmetric matrix, and therefore e^C must be a rotation.

Since every commutator between antisymmetric matrices can be written in terms of the commutators [J_{(mn)}, J_{(pq)}] of the antisymmetric basis matrices, a general formula for the latter would seem to be useful. In fact, the formula given earlier, namely

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

completely characterises the Lie algebra \mathfrak{so}(D). To conclude this note we will therefore derive this formula ab initio, starting from the formula

(J_{(mn)})_{wk} = \delta_{mw} \delta_{nk} - \delta_{mk} \delta_{nw}

for the wk-th element of each matrix J_{(mn)}. We have

[J_{(mn)}, J_{(pq)}] = J_{(mn)} J_{(pq)} - J_{(pq)} J_{(mn)}

Focus on J_{(mn)} J_{(pq)} first. Using the Einstein summation convention, the product of the w-th row of J_{(mn)} with the k-th column of J_{(pq)} is

(J_{(mn)})_{wz} (J_{(pq)})_{zk}

= (\delta_{mw} \delta_{nz} - \delta_{mz} \delta_{nw})(\delta_{pz} \delta_{qk} - \delta_{pk} \delta_{qz})

= \delta_{mw} \delta_{qk} \delta_{nz} \delta_{pz} + \delta_{nw} \delta_{pk} \delta_{mz} \delta_{qz} - \delta_{mw} \delta_{pk} \delta_{nz} \delta_{qz} - \delta_{nw} \delta_{qk} \delta_{mz} \delta_{pz}

Now focus on J_{(pq)} J_{(mn)}. The product of the w-th row of J_{(pq)} with the k-th column of J_{(mn)} is

(J_{(pq)})_{wz} (J_{(mn)})_{zk}

= (\delta_{pw} \delta_{qz} - \delta_{pz} \delta_{qw})(\delta_{mz} \delta_{nk} - \delta_{mk} \delta_{nz})

= \delta_{pw} \delta_{nk} \delta_{qz} \delta_{mz} + \delta_{qw} \delta_{mk} \delta_{pz} \delta_{nz} - \delta_{pw} \delta_{mk} \delta_{qz} \delta_{nz} - \delta_{qw} \delta_{nk} \delta_{pz} \delta_{mz}

So the element in the w-th row and k-th column of [J_{(mn)}, J_{(pq)}] is

\delta_{mw} \delta_{qk} \delta_{nz} \delta_{pz} + \delta_{nw} \delta_{pk} \delta_{mz} \delta_{qz} + \delta_{pw} \delta_{mk} \delta_{qz} \delta_{nz} + \delta_{qw} \delta_{nk} \delta_{pz} \delta_{mz}

- \delta_{mw} \delta_{pk} \delta_{nz} \delta_{qz} -  \delta_{nw} \delta_{qk} \delta_{mz} \delta_{pz} - \delta_{pw} \delta_{nk} \delta_{qz} \delta_{mz} - \delta_{qw} \delta_{mk} \delta_{pz} \delta_{nz}

But notice that

\delta_{nz} \delta_{pz} = \delta_{np}

and similarly for the other Einstein summation terms. Thus, the above sum reduces to

(\delta_{mw} \delta_{qk} - \delta_{qw} \delta_{mk})\delta_{np} + (\delta_{nw} \delta_{pk} - \delta_{pw} \delta_{nk})\delta_{mq}

+ (\delta_{pw} \delta_{mk} - \delta_{mw} \delta_{pk})\delta_{nq} + (\delta_{qw} \delta_{nk} - \delta_{nw} \delta_{qk})\delta_{mp}

But

(\delta_{mw} \delta_{qk} - \delta_{mk} \delta_{qw})\delta_{np} = \delta_{np} (J_{(mq)})_{wk}

(\delta_{nw} \delta_{pk} - \delta_{nk} \delta_{pw})\delta_{mq} = \delta_{mq} (J_{(np)})_{wk}

(\delta_{mk} \delta_{pw} - \delta_{mw} \delta_{pk})\delta_{nq} = - \delta_{nq} (J_{(mp)})_{wk}

(\delta_{nk} \delta_{qw} - \delta_{nw} \delta_{qk})\delta_{mp} = - \delta_{mp} (J_{(nq)})_{wk}

Thus the element in the w-th row and k-th column of [J_{(mn)}, J_{(pq)}] is

\delta_{np} (J_{(mq)})_{wk} + \delta_{mq} (J_{(np)})_{wk} - \delta_{mp} (J_{(nq)})_{wk} - \delta_{nq} (J_{(mp)})_{wk}

Extending this to the matrix as a whole gives the required formula:

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}