The power and inverse iterations have a flaw that seems obvious once it is pointed out. Given a seed vector u \mathbf{u} u , they produce a sequence of vectors u 1 , u 2 , … \mathbf{u}_1,\mathbf{u}_2,\ldots u 1 , u 2 , … that are scalar multiples of u , A u , A 2 u , … \mathbf{u},\mathbf{A}\mathbf{u},\mathbf{A}^{2}\mathbf{u},\ldots u , Au , A 2 u , … , but only the most recent vector is used to produce an eigenvector estimate.
It stands to reason that we could do no worse, and perhaps much better, if we searched among all linear combinations of the vectors seen in the past. In other words, we seek a solution in the range (column space) of the matrix
K m = [ u A u A 2 u ⋯ A m − 1 u ] . \mathbf{K}_m =
\begin{bmatrix}
\mathbf{u} & \mathbf{A}\mathbf{u} & \mathbf{A}^{2} \mathbf{u} & \cdots & \mathbf{A}^{m-1} \mathbf{u}
\end{bmatrix}. K m = [ u Au A 2 u ⋯ A m − 1 u ] . Definition 8.4.1 (Krylov matrix and subspace)
Given n × n n\times n n × n matrix A \mathbf{A} A and n n n -vector u \mathbf{u} u , the m m m th Krylov matrix is the n × m n\times m n × m matrix (8.4.1) . The range (i.e., column space) of this matrix is the m m m th Krylov subspace K m \mathcal{K}_m K m .
In general, we expect that the dimension of the Krylov[1] subspace K m \mathcal{K}_m K m , which is the rank of K m \mathbf{K}_m K m , equals m m m , though it may be smaller.
8.4.1 Properties ¶ As we have seen with the power iteration, part of the appeal of the Krylov matrix is that it can be generated in a way that fully exploits the sparsity of A \mathbf{A} A , simply through repeated matrix-vector multiplication. Furthermore, we have some important mathematical properties.
Suppose A \mathbf{A} A is n × n n\times n n × n , 0 < m < n 0<m<n 0 < m < n , and a vector u \mathbf{u} u is used to generate Krylov subspaces. If x ∈ K m \mathbf{x}\in\mathcal{K}_m x ∈ K m , then the following hold:
x = K m z \mathbf{x} = \mathbf{K}_m \mathbf{z} x = K m z for some z ∈ C m \mathbf{z}\in\mathbb{C}^m z ∈ C m .x ∈ K m + 1 \mathbf{x} \in \mathcal{K}_{m+1} x ∈ K m + 1 .A x ∈ K m + 1 \mathbf{A}\mathbf{x} \in \mathcal{K}_{m+1} Ax ∈ K m + 1 .If x ∈ K m \mathbf{x}\in\mathcal{K}_m x ∈ K m , then for some coefficients c 1 , … , c m c_1,\ldots,c_m c 1 , … , c m ,
x = c 1 u + c 2 A u + ⋯ + c m A m − 1 u . \mathbf{x} = c_1 \mathbf{u} + c_2 \mathbf{A} \mathbf{u} + \cdots + c_m \mathbf{A}^{m-1} \mathbf{u}. x = c 1 u + c 2 Au + ⋯ + c m A m − 1 u . Thus let z = [ c 1 ⋯ c m ] T \mathbf{z}= \begin{bmatrix} c_1 & \cdots & c_m \end{bmatrix}^T z = [ c 1 ⋯ c m ] T . Also, x ∈ K m + 1 \mathbf{x}\in\mathcal{K}_{m+1} x ∈ K m + 1 , as we can add zero times A m u \mathbf{A}^{m}\mathbf{u} A m u to the sum. Finally,
A x = c 1 A u + c 2 A 2 u + ⋯ + c m A m u ∈ K m + 1 . \mathbf{A}\mathbf{x} = c_1 \mathbf{A} \mathbf{u} + c_2 \mathbf{A}^{2} \mathbf{u} + \cdots + c_m \mathbf{A}^{m} \mathbf{u} \in \mathcal{K}_{m+1}. Ax = c 1 Au + c 2 A 2 u + ⋯ + c m A m u ∈ K m + 1 . 8.4.2 Dimension reduction ¶ The problems A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b and A x = λ x \mathbf{A}\mathbf{x}=\lambda\mathbf{x} Ax = λ x are posed in a very high-dimensional space R n \mathbb{R}^n R n or C n \mathbb{C}^n C n . One way to approximate them is to replace the full n n n -dimensional space with a much lower-dimensional K m \mathcal{K}_m K m for m ≪ n m\ll n m ≪ n . This is the essence of the Krylov subspace approach.
For instance, we can interpret A x m ≈ b \mathbf{A}\mathbf{x}_m\approx \mathbf{b} A x m ≈ b in the sense of linear least-squares—that is, using Theorem 8.4.1 to let x = K m z \mathbf{x}=\mathbf{K}_m\mathbf{z} x = K m z ,
min x ∈ K m ∥ A x − b ∥ = min z ∈ C m ∥ A ( K m z ) − b ∥ = min z ∈ C m ∥ ( A K m ) z − b ∥ . \min_{\mathbf{x}\in\mathcal{K}_m} \| \mathbf{A}\mathbf{x}-\mathbf{b} \|
= \min_{\mathbf{z}\in\mathbb{C}^m} \| \mathbf{A}(\mathbf{K}_m\mathbf{z})-\mathbf{b} \|
= \min_{\mathbf{z}\in\mathbb{C}^m} \| (\mathbf{A}\mathbf{K}_m)\mathbf{z}-\mathbf{b} \|. x ∈ K m min ∥ Ax − b ∥ = z ∈ C m min ∥ A ( K m z ) − b ∥ = z ∈ C m min ∥ ( A K m ) z − b ∥. The natural seed vector for K m \mathcal{K}_m K m in this case is the vector b \mathbf{b} b . In the next example we try to implement (8.4.4) . We do take one precaution: because the vectors A k b \mathbf{A}^{k}\mathbf{b} A k b may become very large or small in norm, we normalize after each multiplication by A \mathbf{A} A , just as we did in the power iteration.
Example 8.4.1 (Conditioning of the Krylov matrix)
Example 8.4.1 First we define a triangular matrix with known eigenvalues, and a random vector b b b .
λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);
Next we build up the first ten Krylov matrices iteratively, using renormalization after each matrix-vector multiplication.
Km = [b zeros(100, 29)]
for m in 1:29
v = A * Km[:, m]
Km[:, m+1] = v / norm(v)
end
Now we solve least-squares problems for Krylov matrices of increasing dimension, recording the residual in each case.
resid = zeros(30)
for m in 1:30
z = (A * Km[:, 1:m]) \ b
x = Km[:, 1:m] * z
resid[m] = norm(b - A * x)
end
The linear system approximations show smooth linear convergence at first, but the convergence stagnates after only a few digits have been found.
using Plots
plot(0:29, resid; m=:o,
xaxis=(L"m"), yaxis=(:log10, L"\| b-Ax_m \|"),
title="Residual for linear systems", legend=:none)
Example 8.4.1 First we define a triangular matrix A \mathbf{A} A with known eigenvalues and a random vector b \mathbf{b} b .
lambda = 10 + (1:100);
A = diag(lambda) + triu(rand(100), 1);
b = rand(100, 1);
Next, we build up the first ten Krylov matrices iteratively. In order to keep the columns from growing exponentially in norm, we normalize them as we go. This doesn’t affect the column space of the Krylov matrix, in principle at least.
Km = b;
for m = 1:29
v = A * Km(:, m);
Km(:, m+1) = v / norm(v);
end
Now we solve least-squares problems for Krylov matrices of increasing dimension, recording the residual in each case.
warning off
resid = zeros(30, 1);
for m = 1:30
z = (A * Km(:, 1:m)) \ b;
x = Km(:, 1:m) * z;
resid(m) = norm(b - A * x);
end
The linear system approximations show smooth linear convergence at first, but the convergence stagnates after only a few digits have been found.
clf
semilogy(resid, '.-')
xlabel('m'), ylabel('|| b-Ax_m ||')
set(gca,'ytick',10.^(-6:2:0))
axis tight, title('Residual for linear systems')
Example 8.4.1 First we define a triangular matrix with known eigenvalues, and a random vector b b b .
ev = 10 + arange(1, 101)
A = triu(random.rand(100, 100), 1) + diag(ev)
b = random.rand(100)
Next we build up the first ten Krylov matrices iteratively, using renormalization after each matrix-vector multiplication.
Km = zeros([100, 30])
Km[:, 0] = b
for m in range(29):
v = A @ Km[:, m]
Km[:, m + 1] = v / norm(v)
Now we solve least-squares problems for Krylov matrices of increasing dimension, recording the residual in each case.
from numpy.linalg import lstsq
resid = zeros(30)
resid[0] = norm(b)
for m in range(1, 30):
z = lstsq(A @ Km[:, :m], b, rcond=None)[0]
x = Km[:, :m] @ z
resid[m] = norm(b - A @ x)
The linear system approximations show smooth linear convergence at first, but the convergence stagnates after only a few digits have been found.
semilogy(range(30), resid, "-o")
xlabel("$m$"), ylabel("$\\| b-Ax_m \\|$")
title(("Residual for linear systems"));
8.4.3 The Arnoldi iteration ¶ The breakdown of convergence in Demo 8.4.1 is due to a critical numerical defect in our approach: the columns of the Krylov matrix (8.4.1) increasingly become parallel to the dominant eigenvector, as (8.2.4) predicts, and therefore to one another. As we saw in The QR factorization , near-parallel vectors create the potential for numerical cancellation. This manifests as a large condition number for K m \mathbf{K}_m K m as m m m grows, eventually creating excessive error when solving the least-squares system.
The polar opposite of an ill-conditioned basis for K m \mathcal{K}_m K m is an orthonormal one. Suppose we had a thin QR factorization of K m \mathbf{K}_m K m :
K m = Q m R m = [ q 1 q 2 ⋯ q m ] [ R 11 R 12 ⋯ R 1 m 0 R 22 ⋯ R 2 m ⋮ ⋱ 0 0 ⋯ R m m ] . \begin{align*}
\mathbf{K}_m = \mathbf{Q}_m \mathbf{R}_m
& =
\begin{bmatrix}
\mathbf{q}_1& \mathbf{q}_2 & \cdots & \mathbf{q}_m
\end{bmatrix}
\begin{bmatrix}
R_{11} & R_{12} & \cdots & R_{1m} \\
0 & R_{22} & \cdots & R_{2m} \\
\vdots & & \ddots & \\
0 & 0 & \cdots & R_{mm}
\end{bmatrix}.
\end{align*} K m = Q m R m = [ q 1 q 2 ⋯ q m ] ⎣ ⎡ R 11 0 ⋮ 0 R 12 R 22 0 ⋯ ⋯ ⋱ ⋯ R 1 m R 2 m R mm ⎦ ⎤ . Then the vectors q 1 , … , q m \mathbf{q}_1,\ldots,\mathbf{q}_m q 1 , … , q m are the orthonormal basis we seek for K m \mathcal{K}_m K m . By Theorem 8.4.1 , we know that A q m ∈ K m + 1 \mathbf{A}\mathbf{q}_m \in \mathcal{K}_{m+1} A q m ∈ K m + 1 , and therefore
A q m = H 1 m q 1 + H 2 m q 2 + ⋯ + H m + 1 , m q m + 1 \mathbf{A} \mathbf{q}_m = H_{1m} \, \mathbf{q}_1 + H_{2m} \, \mathbf{q}_2 + \cdots + H_{m+1,m}\, \mathbf{q}_{m+1} A q m = H 1 m q 1 + H 2 m q 2 + ⋯ + H m + 1 , m q m + 1 for some choice of the H i j H_{ij} H ij . Note that by using orthonormality, we have
q i ∗ ( A q m ) = H i m , i = 1 , … , m . \mathbf{q}_i^* (\mathbf{A}\mathbf{q}_m) = H_{im}, \qquad i=1,\ldots, m. q i ∗ ( A q m ) = H im , i = 1 , … , m . Since we started by assuming that we know q 1 , … , q m \mathbf{q}_1,\ldots,\mathbf{q}_m q 1 , … , q m , the only unknowns in (8.4.6) are H m + 1 , m H_{m+1,m} H m + 1 , m and q m + 1 \mathbf{q}_{m+1} q m + 1 . But they appear only as a product, and we know that q m + 1 \mathbf{q}_{m+1} q m + 1 is a unit vector, so they are uniquely defined (up to sign) by the other terms in the equation.
We can now proceed iteratively.
Algorithm 8.4.1 (Arnoldi iteration)
Given matrix A \mathbf{A} A and vector u \mathbf{u} u :
Let q 1 = u / ∥ u ∥ \mathbf{q}_1= \mathbf{u} \,/\, \| \mathbf{u}\| q 1 = u / ∥ u ∥ .
For m = 1 , 2 , … m=1,2,\ldots m = 1 , 2 , …
a. Use (8.4.7) to find H i m H_{im} H im for i = 1 , … , m i=1,\ldots,m i = 1 , … , m .
b. Let
v = ( A q m ) − H 1 m q 1 − H 2 m q 2 − ⋯ − H m m q m . \mathbf{v} = (\mathbf{A} \mathbf{q}_m) - H_{1m} \, \mathbf{q}_1 - H_{2m}\, \mathbf{q}_2 - \cdots - H_{mm}\, \mathbf{q}_m. v = ( A q m ) − H 1 m q 1 − H 2 m q 2 − ⋯ − H mm q m . c. Let H m + 1 , m = ∥ v ∥ H_{m+1,m}=\|\mathbf{v}\| H m + 1 , m = ∥ v ∥ .
d. Let q m + 1 = v / H m + 1 , m \mathbf{q}_{m+1}=\mathbf{v}\,/\,H_{m+1,m} q m + 1 = v / H m + 1 , m .
Return the n × ( m + 1 ) n\times (m+1) n × ( m + 1 ) matrix Q m + 1 \mathbf{Q}_{m+1} Q m + 1 whose columns are q 1 , … , q m + 1 \mathbf{q}_1,\dots, \mathbf{q}_{m+1} q 1 , … , q m + 1 and the ( m + 1 ) × m (m+1)\times m ( m + 1 ) × m matrix H m \mathbf{H}_m H m .
The vectors q 1 , … , q m \mathbf{q}_1,\dots, \mathbf{q}_m q 1 , … , q m are an orthonormal basis for the space K m \mathcal{K}_m K m , which makes them ideal for numerical computations in that space. The matrix H m \mathbf{H}_m H m plays an important role, too, and has a particular “triangular plus” structure,
H m = [ H 11 H 12 ⋯ H 1 m H 21 H 22 ⋯ H 2 m H 32 ⋱ ⋮ ⋱ H m m H m + 1 , m ] . \mathbf{H}_m = \begin{bmatrix}
H_{11} & H_{12} & \cdots & H_{1m} \\
H_{21} & H_{22} & \cdots & H_{2m} \\
& H_{32} & \ddots & \vdots \\
& & \ddots & H_{mm} \\
& & & H_{m+1,m}
\end{bmatrix}. H m = ⎣ ⎡ H 11 H 21 H 12 H 22 H 32 ⋯ ⋯ ⋱ ⋱ H 1 m H 2 m ⋮ H mm H m + 1 , m ⎦ ⎤ . The identity (8.4.6) used over all the iterations can be collected into a single matrix equation,
A Q m = Q m + 1 H m , \mathbf{A}\mathbf{Q}_m = \mathbf{Q}_{m+1} \mathbf{H}_m, A Q m = Q m + 1 H m , which is a fundamental identity of Krylov subspace methods.
8.4.4 Implementation ¶ An implementation of the Arnoldi iteration is given in Function 8.4.2 . A careful inspection shows that inner nested loop does not exactly implement (8.4.7) and (8.4.8) . The reason is numerical stability. Though the described and implemented versions are mathematically equivalent in exact arithmetic (see Exercise 6 ), the approach in Function 8.4.2 is more stable.
Arnoldi iteration1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
"""
arnoldi(A, u, m)
Perform the Arnoldi iteration for `A` starting with vector `u`, out
to the Krylov subspace of degree `m`. Returns the orthonormal basis
(`m`+1 columns) and the upper Hessenberg `H` of size `m`+1 by `m`.
"""
function arnoldi(A, u, m)
n = length(u)
Q = zeros(n, m+1)
H = zeros(m+1, m)
Q[:, 1] = u / norm(u)
for j in 1:m
# Find the new direction that extends the Krylov subspace.
v = A * Q[:, j]
# Remove the projections onto the previous vectors.
for i in 1:j
H[i, j] = dot(Q[:, i], v)
v -= H[i, j] * Q[:, i]
end
# Normalize and store the new basis vector.
H[j+1, j] = norm(v)
Q[:, j+1] = v / H[j+1, j]
end
return Q, H
end
Arnoldi iteration1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function [Q, H] = arnoldi(A, u, m)
% ARNOLDI Arnoldi iteration for Krylov subspaces.
% Input:
% A square matrix (n by n)
% u initial vector
% m number of iterations
% Output:
% Q orthonormal basis of Krylov space (n by m+1)
% H upper Hessenberg matrix, A*Q(:,1:m)=Q*H (m+1 by m)
n = length(A);
Q = zeros(n, m+1);
H = zeros(m+1, m);
Q(:, 1) = u / norm(u);
for j = 1:m
% Find the new direction that extends the Krylov subspace.
v = A * Q(:, j);
% Remove the projections onto the previous vectors.
for i = 1:j
H(i, j) = Q(:, i)' * v;
v = v - H(i,j) * Q(:,i);
end
% Normalize and store the new basis vector.
H(j+1, j) = norm(v);
Q(:, j+1) = v / H(j+1, j);
end
Arnoldi iteration1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def arnoldi(A, u, m):
"""
arnoldi(A, u, m)
Perform the Arnoldi iteration for A starting with vector u, out to the Krylov
subspace of degree m. Return the orthonormal basis (m+1 columns) and the upper
Hessenberg H of size m+1 by m.
"""
n = u.size
Q = np.zeros([n, m + 1])
H = np.zeros([m + 1, m])
Q[:, 0] = u / np.linalg.norm(u)
for j in range(m):
# Find the new direction that extends the Krylov subspace.
v = A @ Q[:, j]
# Remove the projections onto the previous vectors.
for i in range(j + 1):
H[i, j] = Q[:, i] @ v
v -= H[i, j] * Q[:, i]
# Normalize and store the new basis vector.
H[j + 1, j] = np.linalg.norm(v)
Q[:, j + 1] = v / H[j + 1, j]
return Q, H
Example 8.4.2 (Arnoldi iteration)
Example 8.4.2 Here again is the linear system from Example 8.4.1 .
λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);
We can use b \mathbf{b} b as the seed vector for the Arnoldi iteration.
Q, H = FNC.arnoldi(A, b, 30)
println("Q has size $(size(Q))")
println("H has size $(size(H))")
Here’s one validation of the key identity (8.4.10) .
should_be_near_zero = opnorm(A * Q[:, 1:20] - Q[:, 1:21] * H[1:21, 1:20])
Using the Krylov matrix to project the linear system into a Kyrlov subspace in Example 8.4.1 was unable to get the residual much smaller than about 10-4 . But the Arnoldi basis gives us a stable way to work in that subspace and get better results.
z = (A * Q) \ b
x = Q * z
@show resid_norm = norm(b - A * x);
resid_norm = norm(b - A * x) = 2.3218556005790843e-9
Example 8.4.2 Here again is the linear system from Example 8.4.1 .
lambda = 10 + (1:100);
A = diag(lambda) + triu(rand(100), 1);
b = rand(100, 1);
We can use b \mathbf{b} b as the seed vector for the Arnoldi iteration.
[Q, H] = arnoldi(A, b, 30);
disp(sprintf("Q is %d by %d", size(Q)))
disp(sprintf("H is %d by %d", size(H)))
Here’s one validation of the key identity (8.4.10) .
should_be_near_zero = norm(A * Q(:, 1:20) - Q(:, 1:21) * H(1:21, 1:20))
Using the Krylov matrix to project the linear system into a Kyrlov subspace in Example 8.4.1 was unable to get the residual much smaller than about 10-4 . But the Arnoldi basis gives us a stable way to work in that subspace and get better results.
z = (A * Q) \ b;
x = Q * z;
resid_norm = norm(b - A * x)
Example 8.4.2 Here again is the linear system from Example 8.4.1 .
ev = 10 + arange(1, 101)
A = triu(random.rand(100, 100), 1) + diag(ev)
b = random.rand(100);
We can use b \mathbf{b} b as the seed vector for the Arnoldi iteration.
Q, H = FNC.arnoldi(A, b, 30)
print("Q has size", Q.shape)
print("H has size", H.shape)
Q has size (100, 31)
H has size (31, 30)
Here’s one validation of the key identity (8.4.10) .
from numpy.linalg import norm
should_be_near_zero = norm(A @ Q[:, :20] - Q[:, :21] @ H[:21, :20])
print(should_be_near_zero)
Using the Krylov matrix to project the linear system into a Kyrlov subspace in Example 8.4.1 was unable to get the residual much smaller than about 10-4 . But the Arnoldi basis gives us a stable way to work in that subspace and get better results.
z, _, _, _ = linalg.lstsq(A @ Q, b)
x = Q @ z
resid_norm = norm(b - A @ x)
print(f"residual norm: {resid_norm:.2e}")
In the next section, we revisit the idea of approximately solving A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b over a Krylov subspace as suggested in Example 8.4.2 . A related idea explored in Exercise 7 is used to approximate the eigenvalue problem for A \mathbf{A} A , which is the approach that underlies eigs
for sparse matrices.
8.4.5 Exercises ¶ ✍ Let A = [ 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 ] . \mathbf{A}=\displaystyle \begin{bmatrix}
0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0
\end{bmatrix}. A = ⎣ ⎡ 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 ⎦ ⎤ .
(a) Find the Krylov matrix K 3 \mathbf{K}_3 K 3 for the seed vector u = e 1 \mathbf{u}=\mathbf{e}_1 u = e 1 .
(b) Find K 3 \mathbf{K}_3 K 3 for the seed vector u = [ 1 ; 1 ; 1 ; 1 ] . \mathbf{u}=\begin{bmatrix}1; \: 1;\: 1; \: 1\end{bmatrix}. u = [ 1 ; 1 ; 1 ; 1 ] .
⌨ For each matrix, make a table of the 2-norm condition numbers κ ( K m ) \kappa(\mathbf{K}_m) κ ( K m ) for m = 1 , … , 10 m=1,\ldots,10 m = 1 , … , 10 . Use a vector of all ones as the Krylov seed.
(a) Matrix from Demo 8.4.1
(b) [ − 2 1 1 − 2 1 ⋱ ⋱ ⋱ 1 − 2 1 1 − 2 ] ( 100 × 100 ) \begin{bmatrix}
-2 & 1 & & & \\
1 & -2 & 1 & & \\
& \ddots & \ddots & \ddots & \\
& & 1 & -2 & 1 \\
& & & 1 & -2
\end{bmatrix} \: (100\times 100) ⎣ ⎡ − 2 1 1 − 2 ⋱ 1 ⋱ 1 ⋱ − 2 1 1 − 2 ⎦ ⎤ ( 100 × 100 )
(c) [ − 2 1 1 1 − 2 1 ⋱ ⋱ ⋱ 1 − 2 1 1 1 − 2 ] ( 200 × 200 ) \begin{bmatrix}
-2 & 1 & & & 1 \\
1 & -2 & 1 & & \\
& \ddots & \ddots & \ddots & \\
& & 1 & -2 & 1 \\
1 & & & 1 & -2
\end{bmatrix} \: (200\times 200) ⎣ ⎡ − 2 1 1 1 − 2 ⋱ 1 ⋱ 1 ⋱ − 2 1 1 1 − 2 ⎦ ⎤ ( 200 × 200 )
✍ Show that if x ∈ K m \mathbf{x}\in\mathcal{K}_m x ∈ K m , then x = p ( A ) u \mathbf{x}=p(\mathbf{A})\mathbf{u} x = p ( A ) u for a polynomial p p p of degree at most m − 1 m-1 m − 1 . (See (7.2.12) for applying a polynomial to a matrix.)
✍ Compute the asymptotic flop requirements for Function 8.4.2 . Assume that due to sparsity, a matrix-vector multiplication A u \mathbf{A}\mathbf{u} Au requires only c n c n c n flops for a constant c c c , rather than the usual O ( n 2 ) O(n^2) O ( n 2 ) .
⌨ When Arnoldi iteration is performed on the Krylov subspace generated using the matrix A = [ 2 1 1 0 1 3 1 0 0 1 3 1 0 1 1 2 ] \mathbf{A}=\displaystyle \begin{bmatrix} 2& 1& 1& 0\\ 1 &3 &1& 0\\ 0& 1& 3& 1\\ 0& 1& 1& 2 \end{bmatrix} A = ⎣ ⎡ 2 1 0 0 1 3 1 1 1 1 3 1 0 0 1 2 ⎦ ⎤ , the results can depend strongly on the initial vector u \mathbf{u} u .
(a) Apply Function 8.4.2 for 3 iterations and output Q
(which should be square) and H
when using the following seed vectors.
(i) [ 1 , 0 , 0 , 0 ] \bigl[1,\,0,\,0,\,0\bigr] [ 1 , 0 , 0 , 0 ] \qquad (ii) [ 1 , 1 , 1 , 1 ] \bigl[1,\,1,\,1,\,1\bigr] [ 1 , 1 , 1 , 1 ] \qquad (iii) rand(4)
(b) Can you explain why case (ii) in part (a) cannot finish successfully? (Hint: What line(s) of the function can possibly return NaN when applied to finite values?)
✍ As mentioned in the text, Function 8.4.2 does not compute H i j H_{ij} H ij as defined by (8.4.7) , but rather
S i j = q i ∗ ( A q j − S 1 j q 1 − ⋯ − S i − 1 , j q i − 1 ) S_{ij} = \mathbf{q}_i^* ( \mathbf{A}\mathbf{q}_j - S_{1j}\,\mathbf{q}_1 - \cdots -
S_{i-1,j}\,\mathbf{q}_{i-1} ) S ij = q i ∗ ( A q j − S 1 j q 1 − ⋯ − S i − 1 , j q i − 1 ) for i = 1 , … , j i=1,\ldots,j i = 1 , … , j . Show that S i j = H i j S_{ij}=H_{ij} S ij = H ij . (Hence the function is mathematically equivalent to our Arnoldi formulas.)
One way to approximate the eigenvalue problem A x = λ x \mathbf{A}\mathbf{x}=\lambda\mathbf{x} Ax = λ x over K m \mathcal{K}_m K m is to restrict x \mathbf{x} x to the low-dimensional spaces K m \mathcal{K}_m K m .
(a) ✍ Show starting from (8.4.10) that
Q m ∗ A Q m = H ~ m , \mathbf{Q}_m^* \mathbf{A} \mathbf{Q}_m = \tilde{\mathbf{H}}_m, Q m ∗ A Q m = H ~ m , where H ~ m \tilde{\mathbf{H}}_m H ~ m is the upper Hessenberg matrix resulting from deleting the last row of H m \mathbf{H}_m H m . What is the size of this matrix?
(b) ✍ Show the reasoning above leads to the approximate eigenvalue problem H ~ m z ≈ λ z \tilde{\mathbf{H}}_m\mathbf{z} \approx \lambda\mathbf{z} H ~ m z ≈ λ z . (Hint: Start with A x ≈ λ x \mathbf{A}\mathbf{x} \approx \lambda\mathbf{x} Ax ≈ λ x , and let x = Q m z \mathbf{x}=\mathbf{Q}_m\mathbf{z} x = Q m z before applying part (a).)
(c) ⌨ Apply Function 8.4.2 to the matrix of Demo 8.4.1 using a random seed vector. Compute eigenvalues of H ~ m \tilde{\mathbf{H}}_m H ~ m for m = 1 , … , 40 m=1,\ldots,40 m = 1 , … , 40 , keeping track in each case of the error between the largest of those values (in magnitude) and the largest eigenvalue of A \mathbf{A} A . Make a log-linear graph of the error as a function of m m m .