Skip to article frontmatterSkip to article content

Eigenvalue decomposition

To this point we have dealt frequently with the solution of the linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}. Alongside this problem in its importance to linear algebra is the eigenvalue problem.

7.2.1Complex matrices

A matrix with real entries can have complex eigenvalues. Therefore, we assume all matrices, vectors, and scalars may be complex in what follows. Recall that a complex number can be represented as a+iba+i b for real aa and bb and where i2=1i^2=-1. The complex conjugate of x=a+ibx=a+i b is denoted xˉ\bar{x} and is given by xˉ=aib\bar{x}=a-i b. The magnitude or modulus of a complex number zz is

z=zzˉ.|z| = \sqrt{z\cdot \bar{z}}.

For the most part, “adjoint” replaces “transpose,” “hermitian” replaces “symmetric,” and “unitary matrix” replaces “orthogonal matrix” when applying our previous results to complex matrices.

7.2.2Eigenvalue decomposition

An easy rewrite of the eigenvalue definition (7.2.1) is that (AλI)x=0(\mathbf{A} - \lambda\mathbf{I}) \mathbf{x} = \boldsymbol{0}. Hence, (AλI)(\mathbf{A} - \lambda\mathbf{I}) is singular, and it therefore must have a zero determinant. This is the property most often used to compute eigenvalues by hand.

The determinant det(AλI)\det(\mathbf{A} - \lambda \mathbf{I}) is called the characteristic polynomial. Its roots are the eigenvalues, so we know that an n×nn\times n matrix has nn eigenvalues, counting algebraic multiplicity.

Suppose that Avk=λkvk\mathbf{A}\mathbf{v}_k=\lambda_k\mathbf{v}_k for k=1,,nk=1,\ldots,n. We can summarize these as

[Av1Av2Avn]=[λ1v1λ2v2λnvn],A[v1v2vn]=[v1v2vn][λ1λ2λn],\begin{split} \begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \cdots & \mathbf{A}\mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \lambda_1 \mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \cdots & \lambda_n \mathbf{v}_n \end{bmatrix}, \\[1mm] \mathbf{A} \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}, \end{split}

which we write as

AV=VD. \mathbf{A} \mathbf{V} = \mathbf{V} \mathbf{D}.

If we find that V\mathbf{V} is a nonsingular matrix, then we arrive at a key factorization.[1]

Observe that if Av=λv\mathbf{A}\mathbf{v} = \lambda \mathbf{v} for nonzero v\mathbf{v}, then the equation remains true for any nonzero multiple of v\mathbf{v}. Therefore:

We stress that while (7.2.6) is possible for all square matrices, (7.2.7) is not. One simple example of a nondiagonalizable matrix is

B=[1101]. \mathbf{B} = \begin{bmatrix} 1 & 1\\0 & 1 \end{bmatrix}.

There is a common circumstance in which we can guarantee an EVD exists. The proof of the following theorem can be found in many elementary texts on linear algebra.

7.2.3Similarity and matrix powers

The particular relationship between matrices A\mathbf{A} and D\mathbf{D} in (7.2.7) is important.

Hence, an EVD transforms A\mathbf{A} to a similar matrix that happens to be diagonal, which is as simple as a matrix gets.

One way to interpret similarity is via change of basis (see Observation A.5):

Bx=SAS1x=S(A(S1x)into S-basis)apply Aout of S-basis.\mathbf{B}\mathbf{x} = \mathbf{S}\mathbf{A}\mathbf{S}^{-1} \mathbf{x} = \underbrace{\mathbf{S} \underbrace{ \Bigl(\mathbf{A} \underbrace{\left( \mathbf{S}^{-1} \mathbf{x}\right)}_{\text{into $S$-basis}}\Bigr)}_{\text{apply $\mathbf{A}$}}}_{\text{out of $S$-basis}} .

That is, A\mathbf{A} and B\mathbf{B} represent the same linear transformation in different bases.

A similarity transformation does not change eigenvalues, a fact that is typically proved in elementary linear algebra texts:

The EVD is especially useful for matrix powers. To begin,

A2=(VDV1)(VDV1)=VD(V1V)DV1=VD2V1.\mathbf{A}^2=(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})=\mathbf{V}\mathbf{D}(\mathbf{V}^{-1}\mathbf{V})\mathbf{D}\mathbf{V}^{-1}=\mathbf{V}\mathbf{D}^2\mathbf{V}^{-1}.

Multiplying this result by A\mathbf{A} repeatedly, we find that

Ak=VDkV1.\mathbf{A}^k = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}.

Because D\mathbf{D} is diagonal, its power Dk\mathbf{D}^k is just the diagonal matrix of the kkth powers of the eigenvalues.

Furthermore, given a polynomial p(z)=c0+c1z++cmzmp(z)=c_0+c_1 z + \cdots + c_m z^m, we can apply the polynomial to the matrix in a straightforward way,

p(A)=c0I+c1A++cmAm.p(\mathbf{A}) = c_0\mathbf{I} +c_1 \mathbf{A} + \cdots + c_m \mathbf{A}^m.

Applying (7.2.11) leads to

p(A)=c0VV1+c1VDV1++cmVDmV1=V[c0I+c1D++cmDm]V1=V[p(λ1)p(λ2)p(λn)]V1.\begin{split} p(\mathbf{A}) & = c_0\mathbf{V}\mathbf{V}^{-1} +c_1 \mathbf{V}\mathbf{D}\mathbf{V}^{-1} + \cdots + c_m \mathbf{V}\mathbf{D}^m\mathbf{V}^{-1} \\ &= \mathbf{V} \cdot [ c_0\mathbf{I} +c_1 \mathbf{D} + \cdots + c_m \mathbf{D}^m] \cdot \mathbf{V}^{-1} \\[1mm] &= \mathbf{V} \cdot \begin{bmatrix} p(\lambda_1) & & & \\ & p(\lambda_2) & & \\ & & \ddots & \\ & & & p(\lambda_n) \end{bmatrix} \cdot \mathbf{V}^{-1}. \end{split}

Finally, given the convergence of Taylor polynomials to common functions, we are able to apply a function ff to a square matrix by replacing pp with ff in (7.2.13).

7.2.4Conditioning of eigenvalues

Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. While many possible results can be derived, we will use just one, the Bauer–Fike theorem.

The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is κ(V)\kappa(\mathbf{V}) times larger than perturbations to the matrix. This result is a bit less straightforward than it might seem—eigenvectors are not unique, so there are multiple possible values for κ(V)\kappa(\mathbf{V}). Even so, the theorem indicates caution when a matrix has eigenvectors that form an ill-conditioned matrix. The limiting case of κ(V)=\kappa(\mathbf{V})=\infty might be interpreted as indicating a nondiagonalizable matrix A\mathbf{A}. The other extreme is also of interest: κ(V)=1\kappa(\mathbf{V})=1, which implies that V\mathbf{V} is unitary.

As we will see in Symmetry and definiteness, hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, (7.2.14) guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.

7.2.5Computing the EVD

Roots of the characteristic polynomial are not used in numerical methods for finding eigenvalues.[2] Practical algorithms for computing the EVD go beyond the scope of this book. The essence of the matter is the connection to matrix powers indicated in (7.2.11). (We will see much more about the importance of matrix powers in Chapter 8.)

If the eigenvalues have different complex magnitudes, then as kk\to\infty the entries on the diagonal of Dk\mathbf{D}^k become increasingly well separated and easy to pick out. It turns out that there is an astonishingly easy and elegant way to accomplish this separation without explicitly computing the matrix powers.

The process demonstrated in Example 7.2.4 is known as the Francis QR iteration, and it can be formulated as an O(n3)O(n^3) algorithm for finding the EVD. It forms the basis of most practical eigenvalue computations, at least until the matrix size approaches 104 or so.

7.2.6Exercises

Footnotes
  1. The terms “factorization” and “decomposition” are equivalent; they coexist mainly for historical reasons.

  2. In fact, the situation is reversed: eigenvalue methods are among the best ways to compute the roots of a given polynomial.

  3. The randn function generates random numbers from a standard normal distribution. In Python, it is found in the numpy.random module.