Skip to article frontmatterSkip to article content

GMRES

The most important use of the Arnoldi iteration is to solve the square linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}, resulting in a well-known algorithm called GMRES.[1]

In Example 8.4.1, we attempted to replace the linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} by the lower-dimensional approximation

minxKmAxb=minzCmAKmzb,\min_{\mathbf{x}\in \mathcal{K}_m} \| \mathbf{A}\mathbf{x}-\mathbf{b} \| = \min_{\mathbf{z}\in\mathbb{C}^m} \| \mathbf{A}\mathbf{K}_m\mathbf{z}-\mathbf{b} \|,

where Km\mathbf{K}_m is the Krylov matrix generated using A\mathbf{A} and the seed vector b\mathbf{b}. This method was unstable due to the poor conditioning of Km\mathbf{K}_m, which is a numerically poor basis for Km\mathcal{K}_m. Instead, we can use the orthonormal basis Qm\mathbf{Q}_m generated by the Arnoldi iteration.

8.5.1Derivation

If we set x=Qmz\mathbf{x}=\mathbf{Q}_m\mathbf{z} in the residual bAx\mathbf{b} - \mathbf{A}\mathbf{x}, we obtain a way to minimize the norm of the residual over Km\mathcal{K}_m:

minzCmAQmzb.\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z} - \mathbf{b} \bigr\|.

From the fundamental Arnoldi identity (8.4.10), this is equivalent to

minzCmQm+1Hmzb.\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\|.

Note that q1\mathbf{q}_1 is a unit multiple of b\mathbf{b}, so b=bQm+1e1\mathbf{b} = \|\mathbf{b}\| \mathbf{Q}_{m+1}\mathbf{e}_1. Thus, (8.5.3) becomes

minzCmQm+1(Hmzbe1).\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\|.

The least-squares problems (8.5.2), (8.5.3), and (8.5.4) are all n×mn\times m. But observe that for any wCm+1\mathbf{w}\in\mathbb{C}^{m+1},

Qm+1w2=wQm+1Qm+1w=ww=w2. \|\mathbf{Q}_{m+1}\mathbf{w}\|^2 = \mathbf{w}^*\mathbf{Q}_{m+1}^*\mathbf{Q}_{m+1}\mathbf{w} = \mathbf{w}^*\mathbf{w} = \|\mathbf{w}\|^2.

The first norm in that equation is on Cn\mathbb{C}^n, while the last is on the much smaller space Cm+1\mathbb{C}^{m+1}. Hence, the least-squares problem (8.5.4) is equivalent to

minzCmHmzbe1, \min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\, \mathbf{e}_1 \bigr\|,

which is of size (m+1)×m(m+1)\times m. We call the solution of this minimization zm\mathbf{z}_m, and then xm=Qmzm\mathbf{x}_m=\mathbf{Q}_m \mathbf{z}_m is the mmth approximation to the solution of Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}.

In exact arithmetic, GMRES should get the exact solution when m=nm=n, but the goal is to reduce the residual enough to stop at some mnm \ll n.

A basic implementation of GMRES is given in Function 8.5.2.

8.5.2Convergence

In Example 8.4.1, we tried to implement the GMRES idea using the Krylov matrix. After some initial reduction in the norm of the residual, improvement stopped due to the poor conditioning of that matrix. We return to the linear system in that example now to show that a stable implementation based on the Arnoldi iteration continues to improve the residual nearly to machine precision.

Thanks to Theorem 8.4.1, minimization of bAx\|\mathbf{b}-\mathbf{A}\mathbf{x}\| over Km+1\mathcal{K}_{m+1} includes minimization over the subset Km\mathcal{K}_m. Hence:

Unfortunately, making other conclusive statements about the convergence of GMRES is not easy. Example 8.5.1 shows the cleanest behavior: essentially, linear convergence down to the range of machine epsilon. But it is possible for the convergence to go through phases of sublinear and superlinear convergence as well. There is a strong dependence on the eigenvalues of the matrix, a fact we address with more precision and detail in the next section.

8.5.3Restarting

One of the practical challenges in GMRES is that as the dimension of the Krylov subspace grows, the number of new entries to be found in Hm\mathbf{H}_m and the total number of basis vectors grow, too. Consequently, both the work and the storage requirements are quadratic in mm, which can become intolerable in some applications. For this reason, GMRES is often used with restarting.

Suppose x^\hat{\mathbf{x}} is an approximate solution of Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}. Then, if we set x=u+x^\mathbf{x}=\mathbf{u}+\hat{\mathbf{x}}, we have A(u+x^)=b\mathbf{A}(\mathbf{u}+\hat{\mathbf{x}}) = \mathbf{b}, or Au=bAx^\mathbf{A}\mathbf{u} = \mathbf{b} - \mathbf{A}\hat{\mathbf{x}}. The conclusion is that if we get an approximate solution and compute its residual r=bAx^\mathbf{r}=\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}, then we need only to solve Au=r\mathbf{A}\mathbf{u} = \mathbf{r} in order to get a correction to x^\hat{\mathbf{x}}.[3]

Restarting guarantees a fixed upper bound on the per-iteration cost of GMRES. However, this benefit comes at a price. Even though restarting preserves progress made in previous iterations, the Krylov space information is discarded and the residual minimization process starts again over low-dimensional spaces. That can significantly retard or even stagnate the convergence.

Restarting creates a tradeoff between the number of iterations and the speed per iteration. It’s essentially impossible in general to predict the ideal restart location in any given problem, so one goes by experience and hopes for the best.

There are other ways to avoid the growth in computational effort as the GMRES/Arnoldi iteration proceeds. Three of the more popular variations are abbreviated CGS, BiCGSTAB, and QMR. We do not describe them in this book.

8.5.4Exercises

Footnotes
  1. GMRES stands for Generalized Minimum RESidual. We will encounter its precursor MINRES in MINRES and conjugate gradients.

  2. This statement is not strictly correct for rare special cases of breakdown where the rank of Km\mathcal{K}_m is less than mm. In that situation, some additional steps must be taken that we do not discuss here.

  3. The new problem needs to be solved for accuracy relative to b\|\mathbf{b}\|, not relative to r\|\mathbf{r}\|.