GMRES - Fundamentals of Numerical Computation

The most important use of the Arnoldi iteration is to solve the square linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ , resulting in a well-known algorithm called GMRES.^[1]

In Example 8.4.1, we attempted to replace the linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ by the lower-dimensional approximation

\min_{\mathbf{x}\in \mathcal{K}_m} \| \mathbf{A}\mathbf{x}-\mathbf{b} \| = \min_{\mathbf{z}\in\mathbb{C}^m} \| \mathbf{A}\mathbf{K}_m\mathbf{z}-\mathbf{b} \|,

(8.5.1)

where $\mathbf{K}_m$ is the Krylov matrix generated using $\mathbf{A}$ and the seed vector $\mathbf{b}$ . This method was unstable due to the poor conditioning of $\mathbf{K}_m$ , which is a numerically poor basis for $\mathcal{K}_m$ . Instead, we can use the orthonormal basis $\mathbf{Q}_m$ generated by the Arnoldi iteration.

8.5.1Derivation¶

If we set $\mathbf{x}=\mathbf{Q}_m\mathbf{z}$ in the residual $\mathbf{b} - \mathbf{A}\mathbf{x}$ , we obtain a way to minimize the norm of the residual over $\mathcal{K}_m$ :

\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{A} \mathbf{Q}_m \mathbf{z} - \mathbf{b} \bigr\|.

(8.5.2)

From the fundamental Arnoldi identity (8.4.10), this is equivalent to

\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} \mathbf{H}_m\mathbf{z}-\mathbf{b} \bigr\|.

(8.5.3)

Note that $\mathbf{q}_1$ is a unit multiple of $\mathbf{b}$ , so $\mathbf{b} = \|\mathbf{b}\| \mathbf{Q}_{m+1}\mathbf{e}_1$ . Thus, (8.5.3) becomes

\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{Q}_{m+1} (\mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\mathbf{e}_1) \bigr\|.

(8.5.4)

The least-squares problems (8.5.2), (8.5.3), and (8.5.4) are all $n\times m$ . But observe that for any $\mathbf{w}\in\mathbb{C}^{m+1}$ ,

\|\mathbf{Q}_{m+1}\mathbf{w}\|^2 = \mathbf{w}^*\mathbf{Q}_{m+1}^*\mathbf{Q}_{m+1}\mathbf{w} = \mathbf{w}^*\mathbf{w} = \|\mathbf{w}\|^2.

(8.5.5)

The first norm in that equation is on $\mathbb{C}^n$ , while the last is on the much smaller space $\mathbb{C}^{m+1}$ . Hence, the least-squares problem (8.5.4) is equivalent to

\min_{\mathbf{z}\in\mathbb{C}^m}\, \bigl\| \mathbf{H}_m\mathbf{z}-\|\mathbf{b}\|\, \mathbf{e}_1 \bigr\|,

(8.5.6)

which is of size $(m+1)\times m$ . We call the solution of this minimization $\mathbf{z}_m$ , and then $\mathbf{x}_m=\mathbf{Q}_m \mathbf{z}_m$ is the $m$ th approximation to the solution of $\mathbf{A}\mathbf{x}=\mathbf{b}$ .

In exact arithmetic, GMRES should get the exact solution when $m=n$ , but the goal is to reduce the residual enough to stop at some $m \ll n$ .

A basic implementation of GMRES is given in Function 8.5.2.

Algorithm 8.5.2 (gmres)

Julia

MATLAB

Python

GMRES

gmres.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
    gmres(A, b, m)

Do `m` iterations of GMRES for the linear system `A`*x=`b`. Returns
the final solution estimate x and a vector with the history of
residual norms. (This function is for demo only, not practical use.)
"""
function gmres(A, b, m)
    n = length(b)
    Q = zeros(n, m+1)
    Q[:, 1] = b / norm(b)
    H = zeros(m+1, m)

    # Initial solution is zero.
    x = 0
    residual = [norm(b); zeros(m)]

    for j in 1:m
        # Next step of Arnoldi iteration.
        v = A * Q[:, j]
        for i in 1:j
            H[i, j] = dot(Q[:, i], v)
            v -= H[i, j] * Q[:, i]
        end
        H[j+1, j] = norm(v)
        Q[:, j+1] = v / H[j+1, j]

        # Solve the minimum residual problem.
        r = [norm(b); zeros(j)]
        z = H[1:j+1, 1:j] \ r
        x = Q[:, 1:j] * z
        residual[j+1] = norm(A * x - b)
    end
    return x, residual
end

GMRES

arngmres.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
function [x, residual] = arngmres(A, b, m)
    % ARNGMRES   GMRES for a linear system (demo only).
    % Input:
    %   A       square matrix (n by n)
    %   b       right-hand side (n by 1)
    %   M       number of iterations
    % Output: 
    %   x       approximate solution (n by 1)
    %   r       history of norms of the residuals

    n = length(A);
    Q = zeros(n, m+1);  
    Q(:, 1) = b / norm(b);
    H = zeros(m+1 ,m);

    % Initial "solution" is zero.
    residual = zeros(m+1, 1);
    residual(1) = norm(b);
    for j = 1:m
        % Next step of Arnoldi iteration.
        v = A * Q(:, j);
        for i = 1:j
            H(i, j) = Q(:, i)' * v;
            v = v - H(i, j) * Q(:,i);
        end
        H(j+1, j) = norm(v);
        Q(:, j+1) = v / H(j+1, j);
      
        % Solve the minimum residual problem.
        r = norm(b) * eye(j+1, 1);
        z = H(1:j+1, 1:j) \ r;
        x = Q(:, 1:j) * z;
        residual(j+1) = norm( A*x - b );
    end
end

GMRES

gmres.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def gmres(A, b, m):
    """
    gmres(A, b, m)

    Do m iterations of GMRES for the linear system A*x=b. Return the final solution
    estimate x and a vector with the history of residual norms. (This function is for
    demo only, not practical use.)
    """
    n = len(b)
    Q = np.zeros([n, m + 1])
    Q[:, 0] = b / np.linalg.norm(b)
    H = np.zeros([m + 1, m])

    # Initial "solution" is zero.
    residual = np.hstack([np.linalg.norm(b), np.zeros(m)])

    for j in range(m):
        # Next step of Arnoldi iteration.
        v = A @ Q[:, j]
        for i in range(j + 1):
            H[i, j] = Q[:, i] @ v
            v -= H[i, j] * Q[:, i]
        H[j + 1, j] = np.linalg.norm(v)
        Q[:, j + 1] = v / H[j + 1, j]

        # Solve the minimum residual problem.
        r = np.hstack([np.linalg.norm(b), np.zeros(j + 1)])
        z = np.linalg.lstsq(H[:j + 2, :j + 1], r)[0]
        x = Q[:, :j + 1] @ z
        residual[j + 1] = np.linalg.norm(A @ x - b)

    return x, residual

8.5.2Convergence¶

In Example 8.4.1, we tried to implement the GMRES idea using the Krylov matrix. After some initial reduction in the norm of the residual, improvement stopped due to the poor conditioning of that matrix. We return to the linear system in that example now to show that a stable implementation based on the Arnoldi iteration continues to improve the residual nearly to machine precision.

Example 8.5.1 (GMRES)

Julia

MATLAB

Python

Example 8.5.1

We define a triangular matrix with known eigenvalues and a random vector $\mathbf{b}$ .

λ = @. 10 + (1:100)
A = triu(rand(100, 100), 1) + diagm(λ)
b = rand(100);

Instead of building the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.

Q, H = FNC.arnoldi(A, b, 60);

The Arnoldi bases are used to solve the least-squares problems defining the GMRES iterates.

resid = [norm(b); zeros(60)]
for m in 1:60
    s = [norm(b); zeros(m)]
    z = H[1:m+1, 1:m] \ s
    x = Q[:, 1:m] * z
    resid[m+1] = norm(b - A * x)
end

The approximations converge smoothly, practically all the way to machine epsilon.

using Plots
plot(0:60, resid, m=:o,
    xaxis=(L"m"),  yaxis=(:log10, "norm of mth residual"),
    title="Residual for GMRES",  legend=:none)

Example 8.5.1

We define a triangular matrix with known eigenvalues and a random vector $\mathbf{b}$ .

lambda = 10 + (1:100);
A = diag(lambda) + triu(rand(100), 1); 
b = rand(100, 1);

Instead of building the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.

[Q, H] = arnoldi(A, b, 60);

The Arnoldi bases are used to solve the least-squares problems defining the GMRES iterates.

resid = norm(b);
for m = 1:60
    s = [norm(b); zeros(m, 1)];
    z = H(1:m+1, 1:m) \ s;
    x = Q(:, 1:m) * z;
    resid = [resid, norm(b - A * x)];
end

The approximations converge smoothly, practically all the way to machine epsilon.

clf
semilogy(resid,'.-')
xlabel('m'),  ylabel('|| b - Ax_m ||')
axis tight, title(('Residual for GMRES'));

Example 8.5.1

We define a triangular matrix with known eigenvalues and a random vector $\mathbf{b}$ .

ev = 10 + arange(1, 101)
A = triu(random.rand(100, 100), 1) + diag(ev)
b = random.rand(100)

Instead of building the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.

Q, H = FNC.arnoldi(A, b, 60)
print(H[:5, :5])

[[76.88449428 38.15381598  6.3506509   4.00620728 -2.8865864 ]
 [22.63671462 53.13497038 28.61912386  2.33800544 -0.85606513]
 [ 0.         23.60292326 61.52225602 24.14890347  1.33049411]
 [ 0.          0.         22.34926875 59.09751735 26.75599015]
 [ 0.          0.          0.         25.47210946 59.89228218]]

The Arnoldi bases are used to solve the least-squares problems defining the GMRES iterates.

from numpy.linalg import lstsq
resid = zeros(61)
resid[0] = norm(b)
for m in range(1, 61):
    s = hstack([norm(b), zeros(m)])
    z = lstsq(H[: m + 1, :m], s, rcond=None)[0]
    x = Q[:, :m] @ z
    resid[m] = norm(b - A @ x)

The approximations converge smoothly, practically all the way to machine epsilon.

semilogy(range(61), resid, "-o")
xlabel("$m$"),  ylabel("$\| b-Ax_m \|$")
title("Residual for GMRES");

<>:2: SyntaxWarning: invalid escape sequence '\|'
<>:2: SyntaxWarning: invalid escape sequence '\|'
/var/folders/0m/fy_f4_rx5xv68sdkrpcm26r00000gq/T/ipykernel_16034/580350478.py:2: SyntaxWarning: invalid escape sequence '\|'
  xlabel("$m$"),  ylabel("$\| b-Ax_m \|$")

Thanks to Theorem 8.4.1, minimization of $\|\mathbf{b}-\mathbf{A}\mathbf{x}\|$ over $\mathcal{K}_{m+1}$ includes minimization over the subset $\mathcal{K}_m$ . Hence:

Unfortunately, making other conclusive statements about the convergence of GMRES is not easy. Example 8.5.1 shows the cleanest behavior: essentially, linear convergence down to the range of machine epsilon. But it is possible for the convergence to go through phases of sublinear and superlinear convergence as well. There is a strong dependence on the eigenvalues of the matrix, a fact we address with more precision and detail in the next section.

8.5.3Restarting¶

One of the practical challenges in GMRES is that as the dimension of the Krylov subspace grows, the number of new entries to be found in $\mathbf{H}_m$ and the total number of basis vectors grow, too. Consequently, both the work and the storage requirements are quadratic in $m$ , which can become intolerable in some applications. For this reason, GMRES is often used with restarting.

Suppose $\hat{\mathbf{x}}$ is an approximate solution of $\mathbf{A}\mathbf{x}=\mathbf{b}$ . Then, if we set $\mathbf{x}=\mathbf{u}+\hat{\mathbf{x}}$ , we have $\mathbf{A}(\mathbf{u}+\hat{\mathbf{x}}) = \mathbf{b}$ , or $\mathbf{A}\mathbf{u} = \mathbf{b} - \mathbf{A}\hat{\mathbf{x}}$ . The conclusion is that if we get an approximate solution and compute its residual $\mathbf{r}=\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}$ , then we need only to solve $\mathbf{A}\mathbf{u} = \mathbf{r}$ in order to get a correction to $\hat{\mathbf{x}}$ .^[3]

Restarting guarantees a fixed upper bound on the per-iteration cost of GMRES. However, this benefit comes at a price. Even though restarting preserves progress made in previous iterations, the Krylov space information is discarded and the residual minimization process starts again over low-dimensional spaces. That can significantly retard or even stagnate the convergence.

Example 8.5.2 (Restarting GMRES)

Julia

MATLAB

Python

Example 8.5.2

The following experiments are based on a matrix resulting from discretization of a partial differential equation.

A = FNC.poisson(50)
n = size(A, 1)
b = ones(n);
spy(A, color=:blues)

We compare unrestarted GMRES with three different thresholds for restarting. Here we are using gmres from the IterativeSolvers package, since our simple implementation does not offer restarting.

using IterativeSolvers
reltol = 1e-12;
plt = plot(title="Convergence of restarted GMRES", legend=:bottomleft,
    xaxis=(L"m"),  yaxis=(:log10, "residual norm", [1e-8, 100]))

for restart in [n, 20, 40, 60]
    x, hist = IterativeSolvers.gmres(A, b; restart, reltol,
        maxiter=100, log=true)
    plot!(hist[:resnorm], label="restart = $restart")
end

plt

The “pure” GMRES curve is the lowest one. All of the other curves agree with it until the first restart. Decreasing the restart value makes the convergence per iteration generally worse, but the time required per iteration smaller as well.

Example 8.5.2

The following experiments are based on a matrix resulting from discretization of a partial differential equation.

d = 50;
A = d^2 * gallery('poisson', d);
n = size(A, 1)
b = ones(n, 1);
clf,  spy(A)

We compare unrestarted GMRES with three different thresholds for restarting. Here we are using the built-in gmres, since our simple implementation does not offer restarting.

clf
restart = [120, 20, 40, 60];
for j = 1:4
    [~,~,~,~,rv] = gmres(A, b, restart(j), 1e-9,120 / restart(j));
    semilogy(0:length(rv) - 1, rv),  hold on
end
title('Convergence of restarted GMRES')
xlabel('m'),  ylabel('residual norm')
legend('no restart','every 20','every 40','every 60','location','southwest');

Example 8.5.2

The following experiments are based on a matrix resulting from discretization of a partial differential equation.

d = 50;  n = d**2
A = FNC.poisson2d(d)
b = ones(n)
spy(A);

We compare unrestarted GMRES with three different thresholds for restarting. Here we are using gmres from scipy.sparse.linalg, since our simple implementation does not offer restarting. We’re also using a trick to accumulate the vector of residual norms as it runs.

from scipy.sparse.linalg import gmres
ctr = lambda rvec: resid.append(norm(rvec))
resid = [1.]
x, flag = gmres(A, b, restart=None, rtol=1e-8, atol=1e-14, maxiter=120, callback=ctr)
semilogy(resid); 
xlabel("$m$"), ylabel("residual norm")
title(("Convergence of unrestarted GMRES"));

maxit = 120
rtol = 1e-8
restarts = [maxit, 20, 40, 60]
hist = lambda rvec: resid.append(norm(rvec))
for r in restarts:
    resid = [1.]
    x, flag = gmres(A, b, restart=r, rtol=rtol, atol=1e-14, maxiter=maxit, callback=hist)
    semilogy(resid)

ylim(1e-8, 2)
legend(["none", "20", "40", "60"])
title(("Convergence of restarted GMRES"));

Restarting creates a tradeoff between the number of iterations and the speed per iteration. It’s essentially impossible in general to predict the ideal restart location in any given problem, so one goes by experience and hopes for the best.

There are other ways to avoid the growth in computational effort as the GMRES/Arnoldi iteration proceeds. Three of the more popular variations are abbreviated CGS, BiCGSTAB, and QMR. We do not describe them in this book.

8.5.4Exercises¶

Exercise 8.5.5

⌨ In this exercise you will see the strong effect the eigenvalues of the matrix may have on GMRES convergence. Let

\mathbf{B}= \begin{bmatrix} 1 & & & \\ & 2 & & \\ & & \ddots & \\ & & & 100 \end{bmatrix},

(8.5.9)

let $\mathbf{I}$ be a $100\times 100$ identity, and let $\mathbf{Z}$ be a $100\times 100$ matrix of zeros. Also let $\mathbf{b}$ be a $200\times 1$ vector of ones. You will use GMRES with restarts, as in Example 8.5.2 (i.e., not the book’s version of gmres).

(a) Let $\mathbf{A} = \begin{bmatrix} \mathbf{B} & \mathbf{I} \\ \mathbf{Z} & \mathbf{B} \end{bmatrix}.$ What are its eigenvalues (no computer required here)? Apply gmres with tolerance 10^-10 for 100 iterations without restarts, and plot the residual convergence.

(b) Repeat part (a) with restarts every 20 iterations.

(c) Now let $\mathbf{A} = \begin{bmatrix} \mathbf{B} & \mathbf{I} \\ \mathbf{Z} & -\mathbf{B} \end{bmatrix}.$ What are its eigenvalues? Repeat part (a). Which matrix is more difficult for GMRES? (Note: Even though this matrix is triangular, GMRES has no way of exploiting that fact.)

Footnotes¶

GMRES stands for Generalized Minimum RESidual. We will encounter its precursor MINRES in MINRES and conjugate gradients.
↩
This statement is not strictly correct for rare special cases of breakdown where the rank of $\mathcal{K}_m$ is less than $m$ . In that situation, some additional steps must be taken that we do not discuss here.
↩
The new problem needs to be solved for accuracy relative to $\|\mathbf{b}\|$ , not relative to $\|\mathbf{r}\|$ .
↩

Preface

Krylov subspaces

Preface

MINRES and conjugate gradients