Eigenvalue decomposition - Fundamentals of Numerical Computation

To this point we have dealt frequently with the solution of the linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ . Alongside this problem in its importance to linear algebra is the eigenvalue problem.

7.2.1Complex matrices¶

A matrix with real entries can have complex eigenvalues. Therefore, we assume all matrices, vectors, and scalars may be complex in what follows. Recall that a complex number can be represented as $a+i b$ for real $a$ and $b$ and where $i^2=-1$ . The complex conjugate of $x=a+i b$ is denoted $\bar{x}$ and is given by $\bar{x}=a-i b$ . The magnitude or modulus of a complex number $z$ is

|z| = \sqrt{z\cdot \bar{z}}.

(7.2.2)

Definition 7.2.2 (Terms for complex matrices)

The adjoint or hermitian of a matrix $\mathbf{A}$ is denoted $\mathbf{A}^*$ and is given by $\mathbf{A}^*=(\overline{\mathbf{A}})^T=\overline{\mathbf{A}^T}$ . The matrix is self-adjoint or hermitian if $\mathbf{A}^*=\mathbf{A}$ .

The 2-norm of a complex vector $\mathbf{u}$ is $\sqrt{\mathbf{u}^*\mathbf{u}}$ . Other vector norms, and all matrix norms, are as defined in Vector and matrix norms.

Complex vectors $\mathbf{u}$ and $\mathbf{v}$ of the same dimension are orthogonal vectors if $\mathbf{u}^*\mathbf{v}=0$ and are orthonormal vectors if both also have unit 2-norm. A unitary matrix is a square matrix with orthonormal columns, or, equivalently, a matrix satisfying $\mathbf{A}^* = \mathbf{A}^{-1}$ .

For the most part, “adjoint” replaces “transpose,” “hermitian” replaces “symmetric,” and “unitary matrix” replaces “orthogonal matrix” when applying our previous results to complex matrices.

7.2.2Eigenvalue decomposition¶

An easy rewrite of the eigenvalue definition (7.2.1) is that $(\mathbf{A} - \lambda\mathbf{I}) \mathbf{x} = \boldsymbol{0}$ . Hence, $(\mathbf{A} - \lambda\mathbf{I})$ is singular, and it therefore must have a zero determinant. This is the property most often used to compute eigenvalues by hand.

The determinant $\det(\mathbf{A} - \lambda \mathbf{I})$ is called the characteristic polynomial. Its roots are the eigenvalues, so we know that an $n\times n$ matrix has $n$ eigenvalues, counting algebraic multiplicity.

Suppose that $\mathbf{A}\mathbf{v}_k=\lambda_k\mathbf{v}_k$ for $k=1,\ldots,n$ . We can summarize these as

\begin{split} \begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \cdots & \mathbf{A}\mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \lambda_1 \mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \cdots & \lambda_n \mathbf{v}_n \end{bmatrix}, \\[1mm] \mathbf{A} \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}, \end{split}

(7.2.5)

which we write as

\mathbf{A} \mathbf{V} = \mathbf{V} \mathbf{D}.

(7.2.6)

If we find that $\mathbf{V}$ is a nonsingular matrix, then we arrive at a key factorization.^[1]

Observe that if $\mathbf{A}\mathbf{v} = \lambda \mathbf{v}$ for nonzero $\mathbf{v}$ , then the equation remains true for any nonzero multiple of $\mathbf{v}$ . Therefore:

We stress that while (7.2.6) is possible for all square matrices, (7.2.7) is not. One simple example of a nondiagonalizable matrix is

\mathbf{B} = \begin{bmatrix} 1 & 1\\0 & 1 \end{bmatrix}.

(7.2.8)

There is a common circumstance in which we can guarantee an EVD exists. The proof of the following theorem can be found in many elementary texts on linear algebra.

Example 7.2.2 (Eigenvalues and eigenvectors)

Julia

MATLAB

Python

Example 7.2.2

The eigvals function returns a vector of the eigenvalues of a matrix.

A = π * ones(2, 2)

2×2 Matrix{Float64}:
 3.14159  3.14159
 3.14159  3.14159

λ = eigvals(A)

2-element Vector{Float64}:
 0.0
 6.283185307179586

If you want the eigenvectors as well, use eigen.

λ, V = eigen(A)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
 0.0
 6.283185307179586
vectors:
2×2 Matrix{Float64}:
 -0.707107  0.707107
  0.707107  0.707107

norm(A * V[:, 2] - λ[2] * V[:, 2])

0.0

Both functions allow you to sort the eigenvalues by specified criteria.

A = diagm(-2.3:1.7)
@show eigvals(A, sortby=real);
@show eigvals(A, sortby=abs);

eigvals(A, sortby = real) = [-2.3, -1.3, -0.3, 0.7, 1.7]


eigvals(A, sortby = abs) =

[-0.3, 0.7, -1.3, 1.7, -2.3]

If the matrix is not diagonalizable, no message is given, but V will be singular. The robust way to detect that circumstance is via $\kappa(\mathbf{V})$ .

A = [-1 1; 0 -1]
λ, V = eigen(A)

┌ Warning: Timed out waiting for `Base.active_repl_backend.ast_transforms` to become available. Autoloads will not work.
└ @ BasicAutoloads ~/.julia/packages/BasicAutoloads/08hIo/src/BasicAutoloads.jl:117
[ Info: If you have a slow startup file, consider moving `register_autoloads` to the end of it.

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
 -1.0
 -1.0
vectors:
2×2 Matrix{Float64}:
 1.0  -1.0
 0.0   2.22045e-16

cond(V)

9.007199254740991e15

Even in the nondiagonalizable case, $\mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D}$ holds.

opnorm(A * V - V * diagm(λ))

2.220446049250313e-16

Example 7.2.2

The eig function with one output argument returns a vector of the eigenvalues of a matrix.

A = pi * ones(2, 2);
lambda = eig(A)

With two output arguments given, eig returns a matrix eigenvectors and a diagonal matrix with the eigenvalues.

[V, D] = eig(A)

We can check the fact that this is an EVD.

norm( A - V*D/V )   % / V is like * inv(V)

If the matrix is not diagonalizable, no message is given, but V will be singular. The robust way to detect that circumstance is via $\kappa(\mathbf{V})$ .

A = [-1 1; 0 -1];
[V, D] = eig(A)

cond(V)

Even in the nondiagonalizable case, $\mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D}$ holds.

norm(A * V - V * D)

Example 7.2.2

The eig function from scipy.linalg will return a vector of eigenvalues and a matrix of associated eigenvectors.

from numpy.linalg import eig
A = pi * ones([2, 2])
d, V = eig(A)
print("eigenvalues:", d)

eigenvalues: [ 6.28318531e+00 -6.97573700e-16]

We can check the fact that this is an EVD (although in practice we never invert a matrix).

from numpy.linalg import inv
D = diag(d)
print(f"should be near zero: {norm(A - V @ D @ inv(V), 2):.2e}")

should be near zero: 1.26e-15

If the matrix is not diagonalizable, no message is given, but V will be singular. The robust way to detect that circumstance is via $\kappa(\mathbf{V})$ .

from numpy.linalg import cond
A = array([[1, 1], [0, 1]])
d, V = eig(A)
print(f"cond(V) is {cond(V):.2e}")

cond(V) is 9.01e+15

But even in the nondiagonalizable case, $\mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{D}$ holds up to roundoff error.

print(f"should be near zero: {norm(A @ V - V @ diag(d), 2):.2e}")

should be near zero: 2.22e-16

7.2.3Similarity and matrix powers¶

The particular relationship between matrices $\mathbf{A}$ and $\mathbf{D}$ in (7.2.7) is important.

Hence, an EVD transforms $\mathbf{A}$ to a similar matrix that happens to be diagonal, which is as simple as a matrix gets.

One way to interpret similarity is via change of basis (see Observation A.5):

\mathbf{B}\mathbf{x} = \mathbf{S}\mathbf{A}\mathbf{S}^{-1} \mathbf{x} = \underbrace{\mathbf{S} \underbrace{ \Bigl(\mathbf{A} \underbrace{\left( \mathbf{S}^{-1} \mathbf{x}\right)}_{\text{into $S$-basis}}\Bigr)}_{\text{apply $\mathbf{A}$}}}_{\text{out of $S$-basis}} .

(7.2.9)

That is, $\mathbf{A}$ and $\mathbf{B}$ represent the same linear transformation in different bases.

A similarity transformation does not change eigenvalues, a fact that is typically proved in elementary linear algebra texts:

The EVD is especially useful for matrix powers. To begin,

\mathbf{A}^2=(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})(\mathbf{V}\mathbf{D}\mathbf{V}^{-1})=\mathbf{V}\mathbf{D}(\mathbf{V}^{-1}\mathbf{V})\mathbf{D}\mathbf{V}^{-1}=\mathbf{V}\mathbf{D}^2\mathbf{V}^{-1}.

(7.2.10)

Multiplying this result by $\mathbf{A}$ repeatedly, we find that

\mathbf{A}^k = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}.

(7.2.11)

Because $\mathbf{D}$ is diagonal, its power $\mathbf{D}^k$ is just the diagonal matrix of the $k$ th powers of the eigenvalues.

Furthermore, given a polynomial $p(z)=c_0+c_1 z + \cdots + c_m z^m$ , we can apply the polynomial to the matrix in a straightforward way,

p(\mathbf{A}) = c_0\mathbf{I} +c_1 \mathbf{A} + \cdots + c_m \mathbf{A}^m.

(7.2.12)

Applying (7.2.11) leads to

\begin{split} p(\mathbf{A}) & = c_0\mathbf{V}\mathbf{V}^{-1} +c_1 \mathbf{V}\mathbf{D}\mathbf{V}^{-1} + \cdots + c_m \mathbf{V}\mathbf{D}^m\mathbf{V}^{-1} \\ &= \mathbf{V} \cdot [ c_0\mathbf{I} +c_1 \mathbf{D} + \cdots + c_m \mathbf{D}^m] \cdot \mathbf{V}^{-1} \\[1mm] &= \mathbf{V} \cdot \begin{bmatrix} p(\lambda_1) & & & \\ & p(\lambda_2) & & \\ & & \ddots & \\ & & & p(\lambda_n) \end{bmatrix} \cdot \mathbf{V}^{-1}. \end{split}

(7.2.13)

Finally, given the convergence of Taylor polynomials to common functions, we are able to apply a function $f$ to a square matrix by replacing $p$ with $f$ in (7.2.13).

7.2.4Conditioning of eigenvalues¶

Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. While many possible results can be derived, we will use just one, the Bauer–Fike theorem.

The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is $\kappa(\mathbf{V})$ times larger than perturbations to the matrix. This result is a bit less straightforward than it might seem—eigenvectors are not unique, so there are multiple possible values for $\kappa(\mathbf{V})$ . Even so, the theorem indicates caution when a matrix has eigenvectors that form an ill-conditioned matrix. The limiting case of $\kappa(\mathbf{V})=\infty$ might be interpreted as indicating a nondiagonalizable matrix $\mathbf{A}$ . The other extreme is also of interest: $\kappa(\mathbf{V})=1$ , which implies that $\mathbf{V}$ is unitary.

As we will see in Symmetry and definiteness, hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, (7.2.14) guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.

Example 7.2.3 (Eigenvalue conditioning)

Julia

MATLAB

Python

Example 7.2.3

We first define a hermitian matrix. Note that the ' operation is the adjoint and includes complex conjugation.

n = 7
A = randn(n, n) + 1im * randn(n, n)
A = (A + A') / 2

7×7 Matrix{ComplexF64}:
   -1.41567+0.0im         0.259952+0.0330673im  …  -0.0599516-0.295516im
   0.259952-0.0330673im   0.132139+0.0im            -0.601534+0.879239im
   0.075175-0.121319im   -0.258324+1.22958im         0.411276-0.199403im
   0.169925+0.384116im     1.09688-0.789896im       0.0573306-0.339173im
  -0.476698+0.360998im   -0.763015+0.777213im          1.5254+0.982086im
    1.07695+0.1078im     0.0467184-0.526594im   …    0.168347+0.62283im
 -0.0599516+0.295516im   -0.601534-0.879239im         1.16686+0.0im

We confirm that the matrix $\mathbf{A}$ is normal by checking that $\kappa(\mathbf{V}) = 1$ (to within roundoff).

λ, V = eigen(A)
@show cond(V);

cond(V) = 1.0000000000000013

Now we perturb $\mathbf{A}$ and measure the effect on the eigenvalues. The Bauer–Fike theorem uses absolute differences, not relative ones.

ΔA = 1e-8 * normalize(randn(n, n) + 1im * randn(n, n))
λ̃ = eigvals(A + ΔA)
dist = minimum([abs(x - y) for x in λ̃, y in λ], dims=2)

7×1 Matrix{Float64}:
 8.464841132570398e-10
 1.1609513124907042e-9
 5.447353210807529e-10
 1.4721005415762032e-9
 1.0100615823760982e-9
 9.138103297967129e-10
 1.3995224576690174e-9

As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.

Now we see what happens for a triangular matrix.

n = 20
x = 1:n
A = triu(x * ones(n)')
A[1:5, 1:5]

5×5 Matrix{Float64}:
 1.0  1.0  1.0  1.0  1.0
 0.0  2.0  2.0  2.0  2.0
 0.0  0.0  3.0  3.0  3.0
 0.0  0.0  0.0  4.0  4.0
 0.0  0.0  0.0  0.0  5.0

This matrix is not especially close to normal.

λ, V = eigen(A)
@show cond(V);

cond(V) = 6.149906664929389e9

As a result, the eigenvalues can change by a good deal more.

ΔA = 1e-8 * normalize(randn(n, n) + 1im * randn(n, n))
λ̃ = eigvals(A + ΔA)
dist = minimum([abs(x - y) for x in λ̃, y in λ], dims=2)
BF_bound = cond(V) * norm(ΔA)
@show maximum(dist), BF_bound;

(maximum(dist), BF_bound) = (0.17446566048046158, 61.499066649293894)

If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.

using Plots
plt = scatter(λ, zeros(n), aspect_ratio=1)
for _ in 1:200
    ΔA = eps(Float32) * normalize(randn(n, n) + 1im * randn(n, n))
    λ̃ = eigvals(A + ΔA)
    scatter!(real(λ̃), imag(λ̃), m=1, color=:black)
end
plt

The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.

Example 7.2.3

We first define a hermitian matrix. Note that the ' operation is the adjoint and includes complex conjugation.

n = 7;
A = randn(n, n) + 1i * randn(n, n);
A = (A + A') / 2;

We confirm that the matrix $\mathbf{A}$ is normal by checking that $\kappa(\mathbf{V}) = 1$ (to within roundoff).

[V, D] = eig(A);
lambda = diag(D);
cond(V)

Now we perturb $\mathbf{A}$ and measure the effect on the eigenvalues. The Bauer–Fike theorem uses absolute differences, not relative ones. Note: since the ordering of eigenvalues can change, we look at all pairwise differences and take the minima.

E = randn(n, n) + 1i * randn(n, n);
E = 1e-8 * E / norm(E);
dd = eig(A + E);
dist = [];
for j = 1:n
    dist = [dist; min(abs(dd - lambda(j)))];
end
dist

As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.

Now we see what happens for a triangular matrix.

n = 20;
x = (1:n)';
A = triu(x * ones(1, n));
A(1:5, 1:5)

This matrix is not at all close to normal.

[V, D] = eig(A);
lambda = diag(D);
cond(V)

As a result, the eigenvalues can change by a good deal more.

E = randn(n, n) + 1i * randn(n, n);
E = 1e-8 * E / norm(E);
dd = eig(A + E);
dist = -Inf;
for j = 1:n
    dist = max(dist, min(abs(dd - lambda(j))));
end
fprintf("max change in eigenvalues: %.2e", dist)
fprintf("Bauer-Fike upper bound: %.2e", cond(V) * norm(E))

max change in eigenvalues:

7.20e-01

Bauer-Fike upper bound:

6.15e+01

If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.

clf
scatter(lambda, 0*lambda)
axis equal; hold on
for k = 1:60
    E = randn(n, n) + 1i * randn(n, n);
    E = eps(single(1)) * E / norm(E);
    dd = eig(A + E);
    plot(real(dd), imag(dd), 'k.', markersize=2)
end

The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.

Example 7.2.3

We first define a hermitian matrix. Note that we add the conjugate transpose of a matrix to itself.

n = 7
A = random.randn(n, n) + 1j * random.randn(n, n)
A = (A + conj(A.T)) / 2

We confirm that the matrix $\mathbf{A}$ is normal by checking that $\kappa(\mathbf{V}) = 1$ (to within roundoff).

from numpy.linalg import eig
d, V = eig(A)
print(f"eigenvector matrix has condition number {cond(V):.5f}")

eigenvector matrix has condition number 1.00000

Now we perturb $\mathbf{A}$ and measure the effect on the eigenvalues. Note that the Bauer–Fike theorem uses absolute differences, not relative ones. Since the ordering of eigenvalues can change, we look at all pairwise differences and take the minima.

E = random.randn(n, n) + 1j * random.randn(n, n)
E = 1e-8 * E / norm(E, 2)
dd, _ = eig(A + E)
dist = array([min([abs(x - y) for x in dd]) for y in d])
print(dist)

[2.59750543e-09 2.74496408e-09 3.40197268e-09 9.17537348e-10
 1.89227076e-09 8.57338697e-10 1.97774527e-09]

As promised, the perturbations in the eigenvalues do not exceed the normwise perturbation to the original matrix.

Now we see what happens for a triangular matrix.

n = 20
x = arange(n) + 1
A = triu(outer(x, ones(n)))
print(A[:5, :5])

[[1. 1. 1. 1. 1.]
 [0. 2. 2. 2. 2.]
 [0. 0. 3. 3. 3.]
 [0. 0. 0. 4. 4.]
 [0. 0. 0. 0. 5.]]

This matrix is not at all close to normal.

d, V = eig(A)
print(f"eigenvector matrix has condition number {cond(V):.2e}")

eigenvector matrix has condition number 6.15e+09

As a result, the eigenvalues can change by a good deal more.

E = random.randn(n, n) + 1j * random.randn(n, n)
E = 1e-8 * E / norm(E, 2)
dd, _ = eig(A + E)
dist = array([min([abs(x - y) for x in dd]) for y in d])
print(f"Maximum eigenvalue change is {max(dist):.2e}")
print(f"The Bauer-Fike upper bound is {cond(V) * norm(E, 2):.2e}")

Maximum eigenvalue change is 5.50e-01
The Bauer-Fike upper bound is 6.15e+01

If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.

clf
scatter(d, zeros(n), 18)
axis("equal") 
for _ in range(100):
    E = random.randn(n, n) + 1j * random.randn(n, n)
    E = finfo(np.float32).eps * E / norm(E, 2)
    dd, _ = eig(A + E)
    scatter(real(dd), imag(dd), 2, 'k')

The plot shows that some eigenvalues are much more affected than others. This situation is not unusual, but it is not explained by the Bauer–Fike theorem.

7.2.5Computing the EVD¶

Roots of the characteristic polynomial are not used in numerical methods for finding eigenvalues.^[2] Practical algorithms for computing the EVD go beyond the scope of this book. The essence of the matter is the connection to matrix powers indicated in (7.2.11). (We will see much more about the importance of matrix powers in Chapter 8.)

If the eigenvalues have different complex magnitudes, then as $k\to\infty$ the entries on the diagonal of $\mathbf{D}^k$ become increasingly well separated and easy to pick out. It turns out that there is an astonishingly easy and elegant way to accomplish this separation without explicitly computing the matrix powers.

Example 7.2.4 (Francis QR iteration)

Julia

MATLAB

Python

Example 7.2.4

Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.

D = diagm([-6, -1, 2, 4, 5])
V, R = qr(randn(5, 5))    # V is unitary
A = V * D * V'

5×5 Matrix{Float64}:
 -0.732868  2.28846  -1.1665    -1.18314   -3.82984
  2.28846   1.16205   1.73426    2.5208     1.19941
 -1.1665    1.73426   2.32097   -0.695172   1.47837
 -1.18314   2.5208   -0.695172   1.31139   -0.605209
 -3.82984   1.19941   1.47837   -0.605209  -0.061534

eigvals(A)

5-element Vector{Float64}:
 -5.999999999999998
 -0.9999999999999996
  2.0000000000000004
  4.000000000000002
  5.0

Now we will take the QR factorization and just reverse the factors.

Q, R = qr(A)
A = R * Q;

It turns out that this is a similarity transformation, so the eigenvalues are unchanged.

eigvals(A)

5-element Vector{Float64}:
 -6.000000000000001
 -0.9999999999999999
  1.9999999999999993
  4.0000000000000036
  4.999999999999995

What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to $\mathbf{D}$ .

for k in 1:40
    Q, R = qr(A)
    A = R * Q
end
A

5×5 Matrix{Float64}:
 -6.0           0.0038263     9.92775e-6   -7.93295e-16  -1.39934e-15
  0.0038263     4.99999       0.00249934    6.52215e-16  -2.56069e-17
  9.92775e-6    0.00249934    4.00001      -3.98256e-13   9.06224e-16
  2.07305e-19   1.09246e-15  -3.98413e-13   2.0           5.17128e-13
  1.27344e-31  -1.24602e-27   5.43212e-25   5.17556e-13  -1.0

Example 7.2.4

Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.

D = diag([-6, -1, 2, 4, 5]);
[V, R]= qr(randn(5, 5));    % V is unitary
A = V * D * V';

sort(eig(A))

Now we will take the QR factorization and just reverse the factors.

[Q, R] = qr(A);
A = R * Q;

It turns out that this is a similarity transformation, so the eigenvalues are unchanged.

sort(eig(A))

What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to $\mathbf{D}$ .

for k = 1:40
    [Q, R] = qr(A);
    A = R * Q;
end
format short e
A

Example 7.2.4

Let’s start with a known set of eigenvalues and an orthogonal eigenvector basis.

from numpy.linalg import qr
D = diag([-6, -1, 2, 4, 5])
V, R = qr(random.randn(5, 5))
A = V @ D @ V.T    # note that V.T = inv(V) here

print(sort(eig(A)[0]))

[-6. -1.  2.  4.  5.]

Now we will take the QR factorization and just reverse the factors.

Q, R = qr(A)
A = R @ Q;

It turns out that this is a similarity transformation, so the eigenvalues are unchanged.

print(sort(eig(A)[0]))

[-6. -1.  2.  4.  5.]

What’s remarkable, and not elementary, is that if we repeat this transformation many times, the resulting matrix converges to $\mathbf{D}$ .

for k in range(40):
    Q, R = qr(A)
    A = R @ Q
set_printoptions(precision=4)
print(A)

[[-6.0000e+00 -1.7753e-02 -6.5410e-07  3.7557e-16  4.2594e-17]
 [-1.7753e-02  5.0000e+00 -4.7655e-05  1.7007e-15 -5.1586e-16]
 [-6.5410e-07 -4.7655e-05  4.0000e+00  2.7326e-13  5.1510e-16]
 [-7.6202e-19  7.3215e-17  2.7266e-13  2.0000e+00  2.6241e-12]
 [-1.1043e-31 -1.0786e-28  8.8793e-25  2.6232e-12 -1.0000e+00]]

The process demonstrated in Example 7.2.4 is known as the Francis QR iteration, and it can be formulated as an $O(n^3)$ algorithm for finding the EVD. It forms the basis of most practical eigenvalue computations, at least until the matrix size approaches 10⁴ or so.

7.2.6Exercises¶

Exercise 7.2.3

⌨ In each part, find all the eigenvalues of $\mathbf{A}$ . Then, choose one eigenvalue $\lambda$ and associated eigenvector $\mathbf{v}$ and compute $\twonorm{\mathbf{A} \mathbf{v} - \lambda \mathbf{v}}$ , which should be comparable to machine epsilon.

(a) $\mathbf{A} = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \end{bmatrix}$

(b) $\mathbf{A} = \begin{bmatrix} 2 & -1 & -1 \\ -2 & 2 & -1 \\ -1 & -2 & 2 \end{bmatrix}$

(c) $\mathbf{A} = \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix}$

(d) $\mathbf{A} = \begin{bmatrix} 3 & 1 & 0 & 0 \\ 1 & 3 & 1 & 0 \\ 0 & 1 & 3 & 1 \\ 0 & 0 & 1 & 3 \end{bmatrix}\qquad$

(e) $\mathbf{A} = \begin{bmatrix} 4 & -3 & -2 & -1\\ -2 & 4 & -2 & -1 \\ -1 & -2 & 4 & -1 \\ -1 & -2 & -1 & 4 \\ \end{bmatrix}$

Exercise 7.2.8

⌨ Eigenvalues of random matrices and their perturbations can be very interesting. In this exercise, the random numbers should be generated from a standard normal distribution. (Use randn in Julia and MATLAB, and numpy.random.randn in Python.)

(a) Let $\mathbf{A}$ be a $60\times 60$ random matrix. Scatter plot its eigenvalues in the complex plane, using a plot aspect ratio of 1 and red diamonds as markers.

(b) For 100 iterations, let $\mathbf{E}$ be another random $60\times 60$ matrix, and on top of the previous graph, plot the eigenvalues of $\mathbf{A}+0.05\mathbf{E}$ as blue dots.

(c) Let $\mathbf{T}$ be the upper triangular part of $\mathbf{A}$ . On a new graph, scatter plot the eigenvalues of $\mathbf{T}$ in the complex plane. (These all lie on the real axis, however.)

(d) Repeat part (b) with $\mathbf{T}$ in place of $\mathbf{A}$ .

(e) Compute some condition numbers and apply Theorem 7.2.3 to explain the dramatic difference between your plots with respect to the dot distributions. (Note: The condition numbers of $\mathbf{A}$ and $\mathbf{T}$ are not relevant.)

Footnotes¶

The terms “factorization” and “decomposition” are equivalent; they coexist mainly for historical reasons.
↩
In fact, the situation is reversed: eigenvalue methods are among the best ways to compute the roots of a given polynomial.
↩

Preface

From matrix to insight

Preface

Singular value decomposition