Skip to article frontmatterSkip to article content

Krylov subspaces

The power and inverse iterations have a flaw that seems obvious once it is pointed out. Given a seed vector u\mathbf{u}, they produce a sequence of vectors u1,u2,\mathbf{u}_1,\mathbf{u}_2,\ldots that are scalar multiples of u,Au,A2u,\mathbf{u},\mathbf{A}\mathbf{u},\mathbf{A}^{2}\mathbf{u},\ldots, but only the most recent vector is used to produce an eigenvector estimate.

It stands to reason that we could do no worse, and perhaps much better, if we searched among all linear combinations of the vectors seen in the past. In other words, we seek a solution in the range (column space) of the matrix

Km=[uAuA2uAm1u].\mathbf{K}_m = \begin{bmatrix} \mathbf{u} & \mathbf{A}\mathbf{u} & \mathbf{A}^{2} \mathbf{u} & \cdots & \mathbf{A}^{m-1} \mathbf{u} \end{bmatrix}.

In general, we expect that the dimension of the Krylov[1] subspace Km\mathcal{K}_m, which is the rank of Km\mathbf{K}_m, equals mm, though it may be smaller.

8.4.1Properties

As we have seen with the power iteration, part of the appeal of the Krylov matrix is that it can be generated in a way that fully exploits the sparsity of A\mathbf{A}, simply through repeated matrix-vector multiplication. Furthermore, we have some important mathematical properties.

8.4.2Dimension reduction

The problems Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} and Ax=λx\mathbf{A}\mathbf{x}=\lambda\mathbf{x} are statements about a very high-dimensional space Cn\mathbb{C}^n. One way to approximate them is to replace the full nn-dimensional space with a much lower-dimensional Km\mathcal{K}_m for mnm\ll n. This is the essence of the Krylov subspace approach.

For instance, we can interpret Axmb\mathbf{A}\mathbf{x}_m\approx \mathbf{b} in the sense of linear least-squares—that is, using Theorem 8.4.1 to let x=Kmz\mathbf{x}=\mathbf{K}_m\mathbf{z},

minxKmAxb=minzCmA(Kmz)b=minzCm(AKm)zb.\min_{\mathbf{x}\in\mathcal{K}_m} \| \mathbf{A}\mathbf{x}-\mathbf{b} \| = \min_{\mathbf{z}\in\mathbb{C}^m} \| \mathbf{A}(\mathbf{K}_m\mathbf{z})-\mathbf{b} \| = \min_{\mathbf{z}\in\mathbb{C}^m} \| (\mathbf{A}\mathbf{K}_m)\mathbf{z}-\mathbf{b} \|.

The natural seed vector for Km\mathcal{K}_m in this case is the vector b\mathbf{b}. In the next example we try to implement (8.4.4). We do take one precaution: because the vectors Akb\mathbf{A}^{k}\mathbf{b} may become very large or small in norm, we normalize after each multiplication by A\mathbf{A}, just as we did in the power iteration.

8.4.3The Arnoldi iteration

The breakdown of convergence in Demo 8.4.1 is due to a critical numerical defect in our approach: the columns of the Krylov matrix (8.4.1) increasingly become parallel to the dominant eigenvector, as (8.2.4) predicts, and therefore to one another. As we saw in The QR factorization, near-parallel vectors create the potential for numerical cancellation. This manifests as a large condition number for Km\mathbf{K}_m as mm grows, eventually creating excessive error when solving the least-squares system.

The polar opposite of an ill-conditioned basis for Km\mathcal{K}_m is an orthonormal one. Suppose we had a thin QR factorization of Km\mathbf{K}_m:

Km=QmRm=[q1q2qm][R11R12R1m0R22R2m00Rmm].\begin{align*} \mathbf{K}_m = \mathbf{Q}_m \mathbf{R}_m & = \begin{bmatrix} \mathbf{q}_1& \mathbf{q}_2 & \cdots & \mathbf{q}_m \end{bmatrix} \begin{bmatrix} R_{11} & R_{12} & \cdots & R_{1m} \\ 0 & R_{22} & \cdots & R_{2m} \\ \vdots & & \ddots & \\ 0 & 0 & \cdots & R_{mm} \end{bmatrix}. \end{align*}

Then the vectors q1,,qm\mathbf{q}_1,\ldots,\mathbf{q}_m are the orthonormal basis we seek for Km\mathcal{K}_m. By Theorem 8.4.1, we know that AqmKm+1\mathbf{A}\mathbf{q}_m \in \mathcal{K}_{m+1}, and therefore

Aqm=H1mq1+H2mq2++Hm+1,mqm+1\mathbf{A} \mathbf{q}_m = H_{1m} \, \mathbf{q}_1 + H_{2m} \, \mathbf{q}_2 + \cdots + H_{m+1,m}\,\mathbf{q}_{m+1}

for some choice of the HijH_{ij}. Note that by using orthonormality, we have

qi(Aqm)=Him,i=1,,m.\mathbf{q}_i^* (\mathbf{A}\mathbf{q}_m) = H_{im},\qquad i=1,\ldots,m.

Since we started by assuming that we know q1,,qm\mathbf{q}_1,\ldots,\mathbf{q}_m, the only unknowns in (8.4.6) are Hm+1,mH_{m+1,m} and qm+1\mathbf{q}_{m+1}. But they appear only as a product, and we know that qm+1\mathbf{q}_{m+1} is a unit vector, so they are uniquely defined (up to sign) by the other terms in the equation.

We can now proceed iteratively.

The Arnoldi iteration finds nested orthonormal bases for a family of nested Krylov subspaces.

8.4.4Key identity

Up to now we have focused only on finding the orthonormal basis that lies in the columns of Qm\mathbf{Q}_m. But the HijH_{ij} values found during the iteration are also important. Taking j=1,2,,mj=1,2,\ldots,m in (8.4.6) leads to

AQm=[Aq1Aqm]=[q1q2qm+1][H11H12H1mH21H22H2mH32HmmHm+1,m]=Qm+1Hm,\begin{split} \mathbf{A}\mathbf{Q}_m &= \begin{bmatrix} \mathbf{A}\mathbf{q}_1 & \cdots \mathbf{A}\mathbf{q}_m \end{bmatrix}\\ & = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_{m+1} \end{bmatrix}\: \begin{bmatrix} H_{11} & H_{12} & \cdots & H_{1m} \\ H_{21} & H_{22} & \cdots & H_{2m} \\ & H_{32} & \ddots & \vdots \\ & & \ddots & H_{mm} \\ & & & H_{m+1,m} \end{bmatrix} = \mathbf{Q}_{m+1} \mathbf{H}_m, \end{split}

where the matrix Hm\mathbf{H}_m has a particular “triangular plus one” structure.

Equation (8.4.9) is a fundamental identity of Krylov subspace methods.

8.4.5Implementation

An implementation of the Arnoldi iteration is given in Function 8.4.2. A careful inspection shows that the loop starting at line 17 does not exactly implement (8.4.7) and (8.4.8). The reason is numerical stability. Though the described and implemented versions are mathematically equivalent in exact arithmetic (see Exercise 6), the approach in Function 8.4.2 is more stable.

In the next section we revisit the idea of approximately solving Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} over a Krylov subspace Km\mathcal{K}_m, using the ONC matrix Qm\mathbf{Q}_m in place of Km\mathbf{K}_m. A related idea explored in Exercise 7 is used to approximate the eigenvalue problem for A\mathbf{A}, which is the approach that underlies eigs for sparse matrices.

8.4.6Exercises

  1. ✍ Let A=[0100001000011000].\mathbf{A}=\displaystyle \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{bmatrix}.

    (a) Find the Krylov matrix K3\mathbf{K}_3 for the seed vector u=e1\mathbf{u}=\mathbf{e}_1.

    (b) Find K3\mathbf{K}_3 for the seed vector u=[1;1;1;1].\mathbf{u}=\begin{bmatrix}1; \: 1;\: 1; \: 1\end{bmatrix}.

  2. ⌨ For each matrix, make a table of the 2-norm condition numbers κ(Km)\kappa(\mathbf{K}_m) for m=1,,10m=1,\ldots,10. Use a vector of all ones as the Krylov seed.

    (a) Matrix from Demo 8.4.1

    (b) [2112112112](100×100)\begin{bmatrix} -2 & 1 & & & \\ 1 & -2 & 1 & & \\ & \ddots & \ddots & \ddots & \\ & & 1 & -2 & 1 \\ & & & 1 & -2 \end{bmatrix} \: (100\times 100)

    (c) [211121121112](200×200)\begin{bmatrix} -2 & 1 & & & 1 \\ 1 & -2 & 1 & & \\ & \ddots & \ddots & \ddots & \\ & & 1 & -2 & 1 \\ 1 & & & 1 & -2 \end{bmatrix} \:(200\times 200)

  1. ✍ Show that if xKm\mathbf{x}\in\mathcal{K}_m, then x=p(A)u\mathbf{x}=p(\mathbf{A})\mathbf{u} for a polynomial pp of degree at most m1m-1. (See (7.2.11) for applying a polynomial to a matrix.)

  2. ✍ Compute the asymptotic flop requirements for Function 8.4.2. Assume that due to sparsity, a matrix-vector multiplication Au\mathbf{A}\mathbf{u} requires only cnc n flops for a constant cc, rather than the usual O(n2)O(n^2).

  3. ⌨ When Arnoldi iteration is performed on the Krylov subspace generated using the matrix A=[2110131001310112]\mathbf{A}=\displaystyle \begin{bmatrix} 2& 1& 1& 0\\ 1 &3 &1& 0\\ 0& 1& 3& 1\\ 0& 1& 1& 2 \end{bmatrix}, the results can depend strongly on the initial vector u\mathbf{u}.

    (a) Apply Function 8.4.2 and output Q and H when using the following seed vectors.

    (i) [1,0,0,0]\bigl[1,\,0,\,0,\,0\bigr] \qquad (ii) [1,1,1,1]\bigl[1,\,1,\,1,\,1\bigr] \qquad (iii) rand(4)

    (b) Can you explain why case (ii) in part (a) cannot finish successfully? (Hint: What line(s) of the function can possibly return NaN when applied to finite values?)

  1. ✍ As mentioned in the text, Function 8.4.2 does not compute HijH_{ij} as defined by (8.4.7), but rather

    Sij=qi(AqjS1jq1Si1,jqi1)S_{ij} = \mathbf{q}_i^* ( \mathbf{A}\mathbf{q}_j - S_{1j}\,\mathbf{q}_1 - \cdots - S_{i-1,j}\,\mathbf{q}_{i-1} )

    for i=1,,ji=1,\ldots,j. Show that Sij=HijS_{ij}=H_{ij}. (Hence the function is mathematically equivalent to our Arnoldi formulas.)

  1. One way to approximate the eigenvalue problem Ax=λx\mathbf{A}\mathbf{x}=\lambda\mathbf{x} over Km\mathcal{K}_m is to restrict x\mathbf{x} to the low-dimensional spaces Km\mathcal{K}_m.

    (a) ✍ Show starting from (8.4.9) that

    QmAQm=H~m,\mathbf{Q}_m^* \mathbf{A} \mathbf{Q}_m = \tilde{\mathbf{H}}_m,

    where H~m\tilde{\mathbf{H}}_m is the upper Hessenberg matrix resulting from deleting the last row of Hm\mathbf{H}_m. What is the size of this matrix?

    (b) ✍ Show the reasoning above leads to the approximate eigenvalue problem H~mzλz\tilde{\mathbf{H}}_m\mathbf{z} \approx \lambda\mathbf{z}. (Hint: Start with Axλx\mathbf{A}\mathbf{x} \approx \lambda\mathbf{x}, and let x=Qmz\mathbf{x}=\mathbf{Q}_m\mathbf{z} before applying part (a).)

    (c) ⌨ Apply Function 8.4.2 to the matrix of Demo 8.4.1 using a random seed vector. Compute eigenvalues of H~m\tilde{\mathbf{H}}_m for m=1,,40m=1,\ldots,40, keeping track in each case of the error between the largest of those values (in magnitude) and the largest eigenvalue of A\mathbf{A}. Make a log-linear graph of the error as a function of mm.

Footnotes
  1. The proper pronunciation of “Krylov” is something like “kree-luv,” but American English speakers often say “kreye-lahv.”