Inverse iteration - Fundamentals of Numerical Computation

Power iteration finds only the dominant eigenvalue. We next show that it can be adapted to find any eigenvalue, provided you start with a reasonably good estimate of it. Some simple linear algebra is all that is needed.

Proof

The equation $\mathbf{A}\mathbf{v}=\lambda \mathbf{v}$ implies that $(\mathbf{A}-s\mathbf{I})\mathbf{v} = \mathbf{A}\mathbf{v} - s\mathbf{I}\mathbf{v} = \lambda\mathbf{v} - s\mathbf{v} = (\lambda-s)\mathbf{v}$ . That proves the first part of the theorem. For the second part, we note that by assumption, $(\mathbf{A}-s\mathbf{I})$ is nonsingular, so $(\mathbf{A}-s\mathbf{I})\mathbf{v} = (\lambda-s) \mathbf{v}$ implies that $\mathbf{v} = (\lambda-s) (\mathbf{A}-s\mathbf{I}) \mathbf{v}$ , or $(\lambda-s)^{-1} \mathbf{v} =(\mathbf{A}-s\mathbf{I})^{-1} \mathbf{v}$ . The discussion above also proves the third part of the theorem.

Consider first part 2 of the theorem with $s=0$ , and suppose that $\mathbf{A}$ has a smallest eigenvalue,

|\lambda_n| \ge |\lambda_{n-1}| \ge \cdots > |\lambda_1|.

(8.3.1)

Then clearly

|\lambda_1^{-1}| > |\lambda_{2}^{-1}| \ge \cdots \ge |\lambda_n^{-1}|,

(8.3.2)

and $\mathbf{A}^{-1}$ has a dominant eigenvalue. Hence, power iteration on $\mathbf{A}^{-1}$ can be used to find the eigenvalue of $\mathbf{A}$ closest to zero. For nonzero values of $s$ , then we suppose there is an ordering

|\lambda_n-s| \ge \cdots \ge |\lambda_2-s| > |\lambda_1-s|.

(8.3.3)

Then it follows that

|\lambda_1-s|^{-1} > |\lambda_{2}-s|^{-1} \ge \cdots \ge |\lambda_n-s|^{-1},

(8.3.4)

and power iteration on the matrix $(\mathbf{A}-s\mathbf{I})^{-1}$ converges to $(\lambda_1-s)^{-1}$ , which is easily solved for $\lambda_1$ itself.

8.3.1Algorithm¶

A literal application of Definition 8.2.2 would include the step

\mathbf{y}_k = (\mathbf{A}-s\mathbf{I})^{-1} \mathbf{x}_k.

(8.3.5)

As always, however, we do not want to explicitly find the inverse of a matrix. Instead, we should implement this step as the solution of a linear system.

Each pass of inverse iteration requires the solution of a linear system of equations with the matrix $\mathbf{B}=\mathbf{A}-s\mathbf{I}$ . This solution might use methods we consider later in this chapter. Here, we use (sparse) PLU factorization and hope for the best. Since the matrix $\mathbf{B}$ is constant, the factorization needs to be done only once for all iterations. The details are in Function 8.3.1.

Algorithm 8.3.1 (inviter)

Julia

MATLAB

Python

Inverse iteration

inviter.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
"""
    inviter(A, s, numiter)

Perform `numiter` inverse iterations with the matrix `A` and shift
`s`, starting from a random vector. Returns a vector of
eigenvalue estimates and the final eigenvector approximation.
"""
function inviter(A, s, numiter)
    n = size(A, 1)
    x = normalize(randn(n))
    β = zeros(numiter)
    fact = lu(A - s * I)
    for k in 1:numiter
        y = fact \ x
        ⍺ = dot(x, y)
        β[k] = (1 / ⍺) + s
        x = y / norm(y)
    end
    return β, x
end

Inverse iteration

inviter.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function [beta, x] = inviter(A, s, numiter)
% INVITER   Shifted inverse iteration for the closest eigenvalue.
% Input:
%   A         square matrix
%   s         value close to targeted eigenvalue (complex scalar)
%   numiter   number of iterations
% Output: 
%   beta      sequence of eigenvalue approximations (vector)
%   x         final eigenvector approximation

    n = length(A);
    x = randn(n, 1);
    x = x / norm(x, inf);
    B = A - s * eye(n);
    [L, U] = lu(B);
    beta = zeros(numiter, 1);
    for k = 1:numiter
        y = U \ (L \ x);
        beta(k) = (1 / dot(x, y)) + s;
        x = y / norm(y);
    end
end

Inverse iteration

inviter.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def inviter(A, s, numiter):
    """
    inviter(A, s, numiter)

    Perform numiter inverse iterations with the matrix A and shift s, starting
    from a random vector, and return a vector of eigenvalue estimates and the final
    eigenvector approximation.
    """
    n = A.shape[0]
    x = np.random.randn(n)
    x = x / np.linalg.norm(x)
    beta = np.zeros(numiter)
    PL, U = lu(A - s * np.eye(n), permute_l=True)
    for k in range(numiter):
        y = np.linalg.solve(U, np.linalg.solve(PL, x))
        alpha = np.dot(x, y)
        beta[k] = (1 / alpha) + s
        x = y / np.linalg.norm(y)

    return beta, x

8.3.2Convergence rate¶

The convergence is linear, at a rate found by reinterpreting Theorem 8.2.1 with $(\mathbf{A}-s\mathbf{I})^{-1}$ in place of $\mathbf{A}.$ With the eigenvalues ordered as in (8.3.3), in the general case we have

\frac{\abs{\beta_{k+1} - \lambda_1}}{\abs{\beta_{k} - \lambda_1}} \rightarrow \abs{\frac{ \lambda_1 - s } {\lambda_2 - s}}\quad \text{ as } \quad k\rightarrow \infty,

(8.3.7)

and in the hermitian case, we have

\frac{\abs{\beta_{k+1} - \lambda_1}}{\abs{\beta_{k} - \lambda_1}} \rightarrow \abs{\frac{ \lambda_1 - s } {\lambda_2 - s}}^2 \quad \text{ as } \quad k\rightarrow \infty.

(8.3.8)

Thus, the convergence is best when the shift $s$ is close to the target eigenvalue—specifically, when it is much closer to that eigenvalue than to any other.

Example 8.3.1 (Convergence of inverse iteration)

Julia

MATLAB

Python

Example 8.3.1

We set up a $5\times 5$ triangular matrix with prescribed eigenvalues on its diagonal.

λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)

5×5 Matrix{Float64}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We run inverse iteration with the shift $s=0.7$ and take the final estimate as our “exact” answer to observe the convergence.

s = 0.7
β, x = FNC.inviter(A, s, 30)
eigval = β[end]

0.5999999999999985

As expected, the eigenvalue that was found is the one closest to 0.7. The convergence is again linear.

using Plots
err = @. eigval - β
plot(0:28, abs.(err[1:end-1]);
    m=:o,  xlabel=L"k", 
    yaxis=(L"|\lambda_3-\beta_k|", :log10, [1e-16, 1]),
    title="Convergence of inverse iteration")

The observed linear convergence rate is found from the data.

@show observed_rate = err[22] / err[21];

observed_rate = err[22] / err[21] = -0.3332696393044612

We reorder the eigenvalues to enforce (8.3.3).

dist = @. abs(λ - s)
λ = λ[sortperm(dist)]

5-element Vector{Float64}:
  0.6
  1.0
  0.0
 -0.4
 -0.75

Hence the theoretical convergence rate is

@show theoretical_rate = (λ[1] - s) / (λ[2] - s);

theoretical_rate = (λ[1] - s) / (λ[2] - s) = -0.3333333333333332

Example 8.3.1

We set up a $5\times 5$ triangular matrix with prescribed eigenvalues on its diagonal.

ev = [1, -0.75, 0.6, -0.4, 0];
A = triu(ones(5, 5), 1) + diag(ev);

We run inverse iteration with the shift $s=0.7$ . The result should converge to the eigenvalue closest to 0.7, which we know to be 0.6 here.

s = 0.7;
[beta, x] = inviter(A, s, 30);
format short
beta(1:10)

The convergence is again linear.

err = 0.6 - beta;
semilogy(abs(err),'.-')
title('Convergence of inverse iteration')
xlabel('k'), ylabel(('|\lambda_j - \beta_k|'));

Let’s reorder the eigenvalues to enforce (8.3.3).

[~, idx] = sort(abs(ev - s));
ev = ev(idx)

Now it is easy to compare the theoretical and observed linear convergence rates.

theoretical_rate = (ev(1) - s) / (ev(2) - s)
observed_rate = err(26) / err(25)

Example 8.3.1

We set up a $5\times 5$ triangular matrix with prescribed eigenvalues on its diagonal.

ev = array([1, -0.75, 0.6, -0.4, 0])
A = triu(ones([5, 5]), 1) + diag(ev)    # triangular matrix, eigs on diagonal

We run inverse iteration with the shift $s=0.7$ . Convergence should be to the eigenvalue closest to the shift, which we know to be 0.6 here.

beta, x = FNC.inviter(A, 0.7, 30)
print(beta)

[0.69701909 0.56665001 0.61093565 0.59636036 0.60121694 0.59959529
 0.60013508 0.599955   0.600015   0.599995   0.60000167 0.59999944
 0.60000019 0.59999994 0.60000002 0.59999999 0.6        0.6
 0.6        0.6        0.6        0.6        0.6        0.6
 0.6        0.6        0.6        0.6        0.6        0.6       ]

As expected, the eigenvalue that was found is the one closest to 0.7. The convergence is again linear.

err = beta[-1] - beta    # last estimate is our best
semilogy(arange(30), abs(err), "-o")
ylim(1e-16, 1)
xlabel("$k$"),  ylabel("$|\\lambda_3 - \\beta_k|$")
title(("Convergence of inverse iteration"));

Let’s reorder the eigenvalues to enforce (8.3.3).

ev = ev[argsort(abs(ev - 0.7))]
print(ev)

[ 0.6   1.    0.   -0.4  -0.75]

Now it is easy to compare the theoretical and observed linear convergence rates.

print(f"theory: {(ev[0] - 0.7) / (ev[1] - 0.7):.5f}")
print(f"observed: {err[21] / err[20]:.5f}")

theory: -0.33333
observed: -0.33326

8.3.3Rayleigh quotient iteration¶

There is a clear opportunity for positive feedback in Definition 8.3.1. The convergence rate of inverse iteration improves as the shift gets closer to the true eigenvalue—and the algorithm computes improving eigenvalue estimates! Updating the shift to $s=\beta_k$ after each iteration greatly accelerates the convergence. You are asked to implement this algorithm in Exercise 8.3.6.

If the eigenvalues are ordered by distance to $s$ , then (asymptotically) one step of inverse iteration reduces the error by the factor $|\lambda_1-s|/|\lambda_2-s|$ . As $s \to\lambda_1$ , the change in the denominator is negligible. So, if at one point the error $\abs{\lambda_1-s}$ is about $\epsilon$ , then the error in the next estimate is reduced by a factor $O(\epsilon)$ , making it $O(\epsilon^2)$ . That is, each step now squares the error, which is quadratic convergence.

Example 8.3.2 (Dynamic shift strategy)

Julia

MATLAB

Python

Example 8.3.2

λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)

5×5 Matrix{Float64}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We begin with a shift $s=0.7$ , which is closest to the eigenvalue 0.6.

s = 0.7
x = ones(5)
y = (A - s * I) \ x
β = x[1] / y[1] + s

0.7034813925570228

Note that the result is not yet any closer to the targeted 0.6. But we proceed (without being too picky about normalization here).

s = β
x = y / y[1]
y = (A - s * I) \ x
β = x[1] / y[1] + s

0.5612761406172997

Still not much apparent progress. However, in just a few more iterations the results are dramatically better.

for k in 1:4
    s = β
    x = y / y[1]
    y = (A - s * I) \ x
    @show β = x[1] / y[1] + s
end

β = x[1] / y[1] + s = 0.5964312884753865
β = x[1] / y[1] + s = 0.5999717091820104
β = x[1] / y[1] + s = 0.5999999978556353
β = x[1] / y[1] + s = 0.6

Example 8.3.2

ev = [1, -0.75, 0.6, -0.4, 0];
A = triu(ones(5, 5), 1) + diag(ev);

We begin with a shift $s=0.7$ , which is closest to the eigenvalue 0.6.

s = 0.7;
x = ones(5, 1);
y = (A - s * eye(5)) \ x;
beta = 1 / (x' * y) + s

Note that the result is not yet any closer to the targeted 0.6. But we proceed (without being too picky about normalization here).

s = beta;
x = y / norm(y);
y = (A - s * eye(5)) \ x;
beta = 1 / (x' * y) + s

Still not much apparent progress. However, in just a few more iterations the results are dramatically better.

format long
for k = 1:4
    s = beta;
    x = y / norm(y);
    y = (A - s * eye(5)) \ x;
    beta = 1 / (x' * y) + s
end

Example 8.3.2

ev = array([1, -0.75, 0.6, -0.4, 0])
A = triu(ones([5, 5]), 1) + diag(ev)    # triangular matrix, eigs on diagonal

We begin with a shift $s=0.7$ , which is closest to the eigenvalue 0.6.

from numpy.linalg import solve
s = 0.7
x = ones(5)
y = solve(A - s * eye(5), x)
beta = x[0] / y[0] + s
print(f"latest estimate: {beta:.8f}")

latest estimate: 0.70348139

Note that the result is not yet any closer to the targeted 0.6. But we proceed (without being too picky about normalization here).

s = beta
x = y / y[0]
y = solve(A - s * eye(5), x)
beta = x[0] / y[0] + s
print(f"latest estimate: {beta:.8f}")

latest estimate: 0.56127614

Still not much apparent progress. However, in just a few more iterations the results are dramatically better.

for k in range(4):
    s = beta
    x = y / y[0]
    y = solve(A - s * eye(5), x)
    beta = x[0] / y[0] + s
    print(f"latest estimate: {beta:.12f}")

latest estimate: 0.596431288475
latest estimate: 0.599971709182
latest estimate: 0.599999997856
latest estimate: 0.600000000000

There is a price to pay for this improvement. The matrix of the linear system to be solved, $(\mathbf{A}-s\mathbf{I}),$ now changes with each iteration. That means that we can no longer do just one LU factorization for the entire iteration. The speedup in convergence usually makes this tradeoff worthwhile, however.

In practice power and inverse iteration are not as effective as the algorithms used by eigs and based on the mathematics described in the rest of this chapter. However, inverse iteration can be useful for turning an eigenvalue estimate into an eigenvector estimate.

8.3.4Exercises¶

Exercise 8.3.1

⌨ Use Function 8.3.1 to perform 10 iterations for the given matrix and shift. Compare the results quantitatively to the convergence given by (8.3.7).

(a) $\mathbf{A} = \begin{bmatrix} 1.1 & 1 \\ 0 & 2.1 \end{bmatrix}, \; s = 1 \qquad$ (b) $\mathbf{A} = \begin{bmatrix} 1.1 & 1 \\ 0 & 2.1 \end{bmatrix}, \; s = 2\qquad$

(c) $\mathbf{A} = \begin{bmatrix} 1.1 & 1 \\ 0 & 2.1 \end{bmatrix}, \; s = 1.6\qquad$ (d) $\mathbf{A} = \begin{bmatrix} 2 & 1 \\ 1 & 0 \end{bmatrix}, \; s = -0.33 \qquad$

(e) $\mathbf{A} = \begin{bmatrix} 6 & 5 & 4 \\ 5 & 4 & 3 \\ 4 & 3 & 2 \end{bmatrix}, \; s = 0.1$

Exercise 8.3.4

✍ When the shift $s$ is very close to an eigenvalue of $\mathbf{A}$ , the matrix $\mathbf{A}-s\mathbf{I}$ is close to a singular matrix. But then (8.3.6) is a linear system with a badly conditioned matrix, which should create a lot of error in the numerical solution for $\mathbf{y}_k$ . However, it happens that the error is mostly in the direction of the eigenvector we are looking for, as the following toy example illustrates.

Prove that $\displaystyle \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}$ has an eigenvalue at zero with associated eigenvector $\mathbf{v}=[-1,1]^T$ . Suppose this matrix is perturbed slightly to $\displaystyle \mathbf{A} = \begin{bmatrix} 1 & 1 \\ 0 & \epsilon \end{bmatrix}$ , and that $\mathbf{x}_k=[1,1]$ in (8.3.6). Show that once $\mathbf{y}_k$ is normalized by its infinity norm, the result is within $\epsilon$ of a multiple of $\mathbf{v}$ .

Exercise 8.3.5

⌨ (Continuation of Exercise 8.2.3.) This exercise concerns the $n^2\times n^2$ sparse matrix defined by FNC.poisson(n) for integer $n$ . It represents a lumped model of a vibrating square membrane held fixed around the edges.

(a) The eigenvalues of $\mathbf{A}$ closest to zero are approximately squares of the frequencies of vibration for the membrane. Using eigs, find the eigenvalue $\lambda_m$ closest to zero for $n=10,15,20,25$ .

(b) For each $n$ in part (a), apply 50 steps of Function 8.3.1 with zero shift. On one graph, plot the four convergence curves $|\beta_k-\lambda_m|$ using a semi-log scale.

(c) Let v be the eigenvector (second output) found by Function 8.3.1 for $n=25$ . Make a surface plot of the vibration mode by reshaping v into an $n\times n$ matrix.

Exercise 8.3.6

⌨ This problem explores the use of Rayleigh quotient iteration.

(a) Modify Function 8.3.1 to change the value of the shift $s$ to be the most recently computed value in the vector $\beta$ . Note that the matrix B must also change with each iteration, so the LU factorization cannot be done just once.

(b) Define a $100\times 100$ matrix with values $k^2$ for $k=1,\ldots,100$ on the main diagonal and random values uniformly distributed between 0 and 1 on the first superdiagonal. (Since this matrix is triangular, the diagonal values are its eigenvalues.) Using an initial shift of $s=920$ , apply Rayleigh quotient iteration. Determine which eigenvalue was found and make a table of the log10 of the errors in the iteration as a function of iteration number. (These should approximately double, until machine precision is reached, due to quadratic convergence.)

(c) Repeat part (b) using a different initial shift of your choice.