Preconditioning Tobin A. Driscoll Richard J. Braun
An important aspect of MINRES and CG (and, by extension, GMRES) is that the convergence of a Krylov method can be expected to deteriorate as the condition number of the matrix increases. Even moderately large condition numbers can make the convergence impractically slow. Therefore, it’s common for these methods to be used with a technique to reduce the relevant condition number.
More specifically, (8.8.1) is known as left preconditioning , which is the simplest and most common type.
As usual, we do not want to actually compute M − 1 \mathbf{M}^{-1} M − 1 for a given M \mathbf{M} M . Instead, we have a linear system with the matrix M − 1 A \mathbf{M}^{-1}\mathbf{A} M − 1 A . In a Krylov method, the operation “let v = A u \mathbf{v}=\mathbf{A}\mathbf{u} v = Au ” becomes a two-step process:
Set y = A u \mathbf{y}=\mathbf{A}\mathbf{u} y = Au . Solve M v = y \mathbf{M}\mathbf{v}=\mathbf{y} Mv = y for v \mathbf{v} v . As an implementation detail, it is common to provide the Krylov solver with code that does step 2; if the matrix M \mathbf{M} M is given, the default is to use sparse factorization.
There are competing objectives in the choice of M \mathbf{M} M . On one hand, we want M − 1 A ≈ I \mathbf{M}^{-1}\mathbf{A}\approx \mathbf{I} M − 1 A ≈ I in some sense because that makes (8.8.1) easy to solve by Krylov iteration. Hence, M ≈ A \mathbf{M}\approx \mathbf{A} M ≈ A . On the other hand, we desire that solving the system M v = y \mathbf{M}\mathbf{v}=\mathbf{y} Mv = y be relatively fast.
8.8.1 Diagonal preconditioning ¶ One of the simplest choices for the preconditioner M \mathbf{M} M is a diagonal matrix. This definitely meets the requirement of being fast to invert: the solution of M v = y \mathbf{M}\mathbf{v}=\mathbf{y} Mv = y is just v i = y i / M i i v_i=y_i/M_{ii} v i = y i / M ii . The only question is whether it can be chosen in such a way that M − 1 A \mathbf{M}^{-1}\mathbf{A} M − 1 A is much more amenable to Krylov iterations than A \mathbf{A} A is. This may be the case when the rows of A \mathbf{A} A differ greatly in scale, or when A \mathbf{A} A is diagonally dominant (see (2.9.1) ).
Example 8.8.1 (Diagonal preconditioning)
Example 8.8.1 Here is an SPD matrix that arises from solving partial differential equations.
using MatrixDepot
A = matrixdepot("wathen", 60)
n = size(A, 1)
@show n, nnz(A);
[ Info: verify download of index files...
[ Info: reading database
┌ Warning: recreating database file
└ @ MatrixDepot ~/.julia/packages/MatrixDepot/4S7Oa/src/download.jl:59
[ Info: reading index files
[ Info: adding metadata...
[ Info: adding svd data...
[ Info: writing database
┌ Warning: exception during initialization: 'KeyError(MatrixDepot)'
└ @ MatrixDepot ~/.julia/packages/MatrixDepot/4S7Oa/src/MatrixDepot.jl:125
(n, nnz(A)) = (11041, 170161)
There is an easy way to use the diagonal elements of A \mathbf{A} A , or any other vector, as a diagonal preconditioner.
using Preconditioners
b = ones(n)
M = DiagonalPreconditioner(diag(A));
We now compare CG with and without the preconditioner.
Sourceusing IterativeSolvers, Plots
plain(b) = cg(A, b, maxiter=200, reltol=1e-4, log=true)
time_plain = @elapsed x, hist1 = plain(b)
prec(b) = cg(A, b, Pl=M, maxiter=200, reltol=1e-4, log=true)
time_prec = @elapsed x, hist2 = prec(b)
@show time_plain, time_prec
rr = hist1[:resnorm]
plot(0:length(rr)-1, rr / rr[1], yscale=:log10, label="plain")
rr = hist2[:resnorm]
plot!(0:length(rr)-1, rr / rr[1], yscale=:log10, label="preconditioned")
title!("Diagonal preconditioning in CG")
The diagonal preconditioner cut down substantially on the number of iterations. The effect on the total time is less dramatic, but this is not a large version of the problem.
Example 8.8.1 Here is an SPD matrix that arises from solving partial differential equations.
A = gallery("wathen", 60, 60);
n = size(A, 1);
clf, spy(A)
There is an easy way to use the diagonal elements of A \mathbf{A} A , or any other vector, as a diagonal preconditioner.
M = spdiags(diag(A), 0, n, n);
We now compare MINRES with and without the preconditioner.
Sourceb = ones(n, 1);
[x, ~, ~, ~, resid_plain] = minres(A, b, 1e-10, 400);
clf, semilogy(resid_plain)
xlabel('iteration number'), ylabel('residual norm')
title('Unpreconditioned MINRES')
[x, ~, ~, ~, resid_prec] = minres(A, b, 1e-10, 400, M);
hold on, semilogy(resid_prec)
title('Precondtioned MINRES')
legend('no prec.', 'with prec.');
The diagonal preconditioner cut down substantially on the number of iterations. The effect on the total time is less dramatic, but this is not a large version of the problem.
Example 8.8.1 Here is an SPD matrix that arises from solving partial differential equations.
from scipy.sparse import sparray
import rogues
A = rogues.wathen(60, 60)
n = A.shape[0]
print(f"Matrix is {n} x {n} with {A.nnz} nonzeros")
Matrix is 11041 x 11041 with 170161 nonzeros
There is an easy way to use the diagonal elements of A \mathbf{A} A , or any other vector, as a diagonal preconditioner.
import scipy.sparse as sp
prec = sp.diags(1 / A.diagonal(), 0)
We now compare CG with and without the preconditioner.
Sourcefrom scipy.sparse.linalg import cg
b = ones(n)
hist = lambda x: resid.append(norm(b - A @ x))
resid = [norm(b)]
start = timer()
x, _ = cg(A, b, rtol=1e-4, maxiter=200, callback=hist)
print(f"No preconditioner: Finished in {timer() - start:.2f} sec")
resid_plain = resid.copy()
resid = [norm(b)]
start = timer()
x, _ = cg(A, b, rtol=1e-4, maxiter=200, M=prec, callback=hist)
print(f"Diagonal preconditioner: Finished in {timer() - start:.2f} sec")
resid_prec = resid.copy()
semilogy(resid_plain, label="no preconditioner")
semilogy(resid_prec, label="diagonal preconditioner")
xlabel("iteration"), ylabel("residual norm")
legend(), title("Convergence of CG with and without preconditioning");
No preconditioner: Finished in 0.48 sec
Diagonal preconditioner: Finished in 0.07 sec
The diagonal preconditioner cut down substantially on the number of iterations and the execution time.
8.8.2 Incomplete factorization ¶ Another general-purpose technique is the incomplete LU factorization . Since true factorization of a sparse matrix usually leads to an undesirable amount of fill-in, incomplete LU sacrifices exact factors by dropping elements smaller than an adjustable threshold.
Example 8.8.2 (Incomplete LU preconditioning)
Example 8.8.2 Here is a nonsymmetric matrix arising from a probabilistic model in computational chemistry.
using SparseArrays
A = sparse(matrixdepot("Watson/chem_master1"))
n = size(A, 1)
@show n, nnz(A), issymmetric(A)
(n, nnz(A), issymmetric(A)) = (40401, 201201, false)
(40401, 201201, false)
Without a preconditioner, GMRES makes essentially no progress after 100 iterations.
b = rand(40000)
const GMRES = IterativeSolvers.gmres
x, history = GMRES(A, b, maxiter=100, reltol=1e-5, log=true)
resnorm = history[:resnorm]
@show resnorm[end] / resnorm[1];
resnorm[end] / resnorm[1] = 0.9692853922823051
The following version of incomplete LU factorization drops all sufficiently small values (i.e., replaces them with zeros). This keeps the number of nonzeros in the factors under control.
using IncompleteLU
iLU = ilu(A, τ=0.25)
@show nnz(iLU) / nnz(A);
nnz(iLU) / nnz(A) = 9.654509669435043
The result is almost 10 times as dense as A \mathbf{A} A and yet still not a true factorization of it. However, it’s close enough for an approximate inverse in a preconditioner. The actual preconditioning matrix is M = L U \mathbf{M}=\mathbf{L}\mathbf{U} M = LU , but we just supply the factorization to gmres
.
_, history = GMRES(A, b, Pl=iLU, maxiter=100, reltol=1e-5, log=true)
history
Converged after 7 iterations.
The τ parameter in ilu
balances the accuracy of the iLU factorization with the time needed to compute it and invert it. As τ → 0 \tau\to 0 τ → 0 , more of the elements are kept, making the preconditioner more effective but slower per iteration.
Sourceplt = plot(0:40, resnorm[1:41] / resnorm[1];
label="no preconditioning", legend=:bottomright,
xaxis=("iteration number"),
yaxis=(:log10, "residual norm"),
title="Incomplete LU preconditioning")
for τ in [2, 1, 0.25, 0.1]
t = @elapsed iLU = ilu(A; τ)
t += @elapsed _, history = GMRES(A, b, Pl=iLU, maxiter=100,
reltol=1e-5, log=true)
resnorm = history[:resnorm]
label = "τ = $τ, time = $(round(t,digits=3))"
plot!(0:length(resnorm)-1, resnorm / resnorm[1]; label)
end
plt
In any given problem, it’s impossible to know in advance where the right balance lies between fidelity and speed for the preconditioner.
Example 8.8.2 Here is a random nonsymmetric matrix.
n = 8000;
A = speye(n) + sprand(n, n, 0.00035);
Without a preconditioner, restarted GMRES makes slow progress.
b = rand(n, 1);
[x, ~, ~, ~, resid_plain] = gmres(A, b, 50, 1e-10, 3); % restart at 50
format short e
resid_plain(1:30:end)
This version of incomplete LU factorization simply prohibits fill-in for the factors, freezing the sparsity pattern of the approximate factors to match the original matrix.
[L, U] = ilu(A);
clf
subplot(121), spy(L)
title('L')
subplot(122), spy(U)
title('U')
disp(sprintf("There are %d nonzeros in A", nnz(A)))
There are 30393 nonzeros in A
It does not produce a true factorization of A \mathbf{A} A .
The actual preconditioning matrix is M = L U \mathbf{M}=\mathbf{L}\mathbf{U} M = LU . However, the gmres
function allows setting the preconditioner by giving the factors independently.
[x, ~, ~, ~, resid_prec] = gmres(A, b, [], 1e-10, 300, L, U);
The preconditioner makes a significant difference in the number of iterations needed.
clf, semilogy(resid_plain)
hold on, semilogy(resid_prec)
xlabel('iteration number'), ylabel('residual norm')
title('Precondtioned GMRES ')
legend('no preconditioner', 'with preconditioner');
Example 8.8.2 Here is a random nonsymmetric matrix.
import scipy.sparse as sp
n = 8000
A = 2.8 * sp.eye(n) + sp.rand(n, n, 0.002)
Without a preconditioner, GMRES can solve a system with this matrix.
from scipy.sparse.linalg import gmres
b = random.rand(n)
hist = lambda rvec: resid.append(norm(rvec))
resid = [1.]
start = timer()
x, flag = gmres(A, b, maxiter=300, rtol=1e-10, restart=50, callback=hist)
print(f"time for plain GMRES: {timer() - start:.3f} sec")
resid_plain = resid.copy()
time for plain GMRES: 0.032 sec
The following version of incomplete LU factorization drops all sufficiently small values (i.e., replaces them with zeros). This keeps the number of nonzeros in the factors under control.
from scipy.sparse.linalg import spilu
iLU = spilu(A, drop_tol=0.2)
print(f"Factors have {iLU.nnz} nonzeros, while A has {A.nnz}")
/var/folders/0m/fy_f4_rx5xv68sdkrpcm26r00000gq/T/ipykernel_16034/769705972.py:2: SparseEfficiencyWarning: spilu converted its input to CSC format
iLU = spilu(A, drop_tol=0.2)
Factors have 96009 nonzeros, while A has 135982
The result is not a true factorization of the original matrix. However, it’s close enough for an approximate inverse in a preconditioner.
from scipy.sparse.linalg import LinearOperator
prec = LinearOperator((n, n), matvec=lambda y: iLU.solve(y))
resid = [1.]; start = timer()
x, flag = gmres(A, b, M=prec, maxiter=300, rtol=1e-10, restart=50, callback=hist)
print(f"time for preconditioned GMRES: {timer() - start:.3f} sec")
resid_prec = resid
time for preconditioned GMRES: 0.023 sec
Sourcesemilogy(resid_plain, label="no prec.")
semilogy(resid_prec, label="iLU prec.")
xlabel("iteration number"), ylabel("residual norm")
legend()
title("GMRES convergence compared");
In practice, good preconditioning is often as important, if not more important, than the specific choice of Krylov method. Effective preconditioning may require deep understanding of the underlying application, however, which limits our ability to go into further details. For instance, the linear system may be some approximation of a continuous mathematical model, and then M \mathbf{M} M can be derived by using a cruder form of the approximation. Krylov methods offer a natural way to exploit these and other approximate inverses.
8.8.3 Exercises ¶ ⌨ The object returned by ilu
stores the factors in a way that optimizes sparse triangular substitution. You can recover the factors themselves via
iLU = ilu(A,τ=0.1) # for example
L, U = I+iLU.L, iLU.U'
In this problem, use A = 1.5I + sprand(800,800,0.005)
.
(a) Using τ = 0.3 \tau=0.3 τ = 0.3 for the factorization, plot the eigenvalues of A \mathbf{A} A and of M − 1 A \mathbf{M}^{-1}\mathbf{A} M − 1 A in the complex plane on side-by-side subplots. Do they support the notion that M − 1 A \mathbf{M}^{-1}\mathbf{A} M − 1 A is “more like” an identity matrix than A \mathbf{A} A is? (Hint: the matrices are small enough to convert to standard dense form for the use of eigvals
.)
(b) Repeat part (a) for τ = 0.03 \tau=0.03 τ = 0.03 . Is M \mathbf{M} M more accurate than in part (a), or less?
⌨ (Continuation of Exercise 8.5.5 .) Let B \mathbf{B} B be diagm(1:100)
, let I \mathbf{I} I be I(100)
, and let Z \mathbf{Z} Z be a 100 × 100 100\times 100 100 × 100 matrix of zeros. Define
A = [ B I Z − B ] \mathbf{A} = \begin{bmatrix}
\mathbf{B} & \mathbf{I} \\ \mathbf{Z} & -\mathbf{B}
\end{bmatrix} A = [ B Z I − B ] and let b \mathbf{b} b be a 200-vector of ones. The matrix A \mathbf{A} A is difficult for GMRES.
(a) Design a diagonal preconditioner M \mathbf{M} M , with all diagonal elements equal to 1 or -1, such that M − 1 A \mathbf{M}^{-1}\mathbf{A} M − 1 A has all positive eigenvalues. Apply gmres
without restarts using this preconditioner and a tolerance of 10-10 for 100 iterations. Plot the convergence curve.
(b) Now design another diagonal preconditioner such that all the eigenvalues of M − 1 A \mathbf{M}^{-1}\mathbf{A} M − 1 A are 1, and apply preconditioned gmres
again. How many iterations are apparently needed for convergence?
⌨ Let A = matrixdepot("Bai/rdb2048")
, and let b
be a vector of 2048 ones. In the steps below, use GMRES for up to 300 iterations without restarts and with a stopping tolerance of 10-4 .
(a) Time the GMRES solution without preconditioning. Verify that convergence was achieved.
(b) Show that diagonal preconditioning is not helpful for this problem.
(c) To two digits, find a value of τ in iLU such that the preconditioned method transitions from effective and faster than part (a) to ineffective.