Skip to article frontmatterSkip to article content

MINRES and conjugate gradients

We have seen before that certain matrix properties enhance solutions to linear algebra problems. One of the most important of these is when A=A\mathbf{A}^*=\mathbf{A}; i.e., A\mathbf{A} is hermitian. The Arnoldi iteration has a particularly useful specialization to this case. While in this section we describe the resulting algorithms, we do not present them in detail or show implementations.

8.6.1Lanczos iteration

Starting from (8.4.10), we left-multiply by Qm\mathbf{Q}_m^* to get

QmAQm=QmQm+1Hm=H~m,\mathbf{Q}_m^* \mathbf{A} \mathbf{Q}_m = \mathbf{Q}_m^* \mathbf{Q}_{m+1} \mathbf{H}_m = \tilde{\mathbf{H}}_m,

where H~m\tilde{\mathbf{H}}_m is rows 1 through mm of Hm\mathbf{H}_m. If A\mathbf{A} is hermitian, then so is the left side of this equation, hence H~m\tilde{\mathbf{H}}_m is hermitian too. But it is also upper Hessenberg, meaning that the (i,j)(i,j) element is zero if i>j+1i > j+1. By symmetry, this means that elements are zero when j>i+1j > i+1 as well.

Equation (8.4.6) of the Arnoldi iteration now simplifies to a much shorter expression:

Aqm=Hm1,mqm1+Hmmqm+Hm+1,mqm+1.\mathbf{A} \mathbf{q}_m = H_{m-1,m} \,\mathbf{q}_{m-1} + H_{mm} \,\mathbf{q}_m + H_{m+1,m}\,\mathbf{q}_{m+1}.

As before in deriving the Arnoldi iteration, when given the first mm vectors we can solve for the entries in column mm of H\mathbf{H} and then for qm+1\mathbf{q}_{m+1}. The resulting process is known as the Lanczos iteration. Its most important practical advantage is that while Arnoldi needs O(m)O(m) steps to get qm+1\mathbf{q}_{m+1} from the previous vectors, Lanczos needs only O(1)O(1) steps, so restarting isn’t required for symmetric matrices.[1]

8.6.2MINRES

When A\mathbf{A} is hermitian and the Arnoldi iteration is reduced to Lanczos, the analog of GMRES is known as MINRES. Like GMRES, MINRES minimizes the residual bAx\|\mathbf{b}-\mathbf{A}\mathbf{x}\| over increasingly larger Krylov spaces.

MINRES is also more theoretically tractable than GMRES. The following result relies on some advanced approximation theory. Recall that the eigenvalues of a hermitian matrix are real.

The bound for a definite matrix is better, as the next theorem shows. The upper bound (8.6.4) on the residual obeys a linear convergence rate. As the product κ+κ\kappa_+\kappa_- grows, the rate of this convergence approaches 1. Hence, the presence of eigenvalues close to the origin (relative to the max eigenvalues) is expected to force a slower convergence.

8.6.3Conjugate gradients

Given positive definiteness in addition to symmetry, we arrive at perhaps the most famous Krylov subspace method for Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}, called conjugate gradients.

Suppose now that A\mathbf{A} is hermitian and positive definite (HPD). Then A\mathbf{A} has a Cholesky factorization, which in the complex case is A=RR\mathbf{A}=\mathbf{R}^*\mathbf{R}. Therefore, for any vector u\mathbf{u},

uAu=(Ru)(Ru)=Ru2,\mathbf{u}^*\mathbf{A}\mathbf{u} = (\mathbf{R}\mathbf{u})^*(\mathbf{R}\mathbf{u})=\|\mathbf{R} \mathbf{u}\|^2,

which is nonnegative and zero only when u=0\mathbf{u}=\boldsymbol{0}, provided A\mathbf{A} (and therefore R\mathbf{R}) is nonsingular. Hence, we can define a special vector norm relative to A\mathbf{A}:

uA=(uAu)1/2.\| \mathbf{u} \|_{\mathbf{A}} = \left( \mathbf{u}^*\mathbf{A}\mathbf{u} \right)^{1/2}.

8.6.4Convergence

The convergence of CG and MINRES is dependent on the eigenvalues of A\mathbf{A}. In the HPD case the eigenvalues are real and positive, and they equal the singular values. Hence, the condition number κ is equal to the ratio of the largest eigenvalue to the smallest one. The following theorem suggests that MINRES and CG are not so different in convergence.

Theorem 8.6.2 characterizes the convergence of MINRES and CG similarly, differing only in whether the measurement is of the residual or the A\mathbf{A}-norm of the error, respectively. While these are different quantities, in practice one may not find a consistent advantage for one method over the other.

As in the indefinite case with MINRES, a larger condition number is associated with slower convergence in the positive definite case. Specifically, to make the bound in (8.6.10) less than a number ε requires

2(κ1κ+1)mϵ,mlog(κ1κ+1)log(ϵ2).\begin{gather*} 2 \left( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^m \approx \epsilon, \\ m \log \left( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right) \approx \log\Bigl( \frac{\epsilon}{2} \Bigr). \end{gather*}

We estimate

κ1κ+1=(1κ1/2)(1+κ1/2)1=(1κ1/2)(1κ1/2+κ1+)=12κ1/2+O(κ1) as κ.\begin{align*} \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} &= (1 - \kappa^{-1/2}\,) (1 + \kappa^{-1/2}\,)^{-1}\\ &= (1 - \kappa^{-1/2}\,) (1 - \kappa^{-1/2} + \kappa^{-1} + \cdots)\\ &= 1 - 2\kappa^{-1/2} + O(\kappa^{-1}) \quad \text{ as $\kappa \rightarrow \infty$.} \end{align*}

With the Taylor expansion log(1+x)=x(x2/2)+\log(1+x) = x - (x^2/2) + \cdots, we finally conclude

2mκ1/2log(ϵ2), or m=O(κ),\begin{gather*} 2 m \kappa^{-1/2} \approx \log\Bigl( \frac{\epsilon}{2} \Bigr), \text{ or } m = O(\sqrt{\kappa}), \end{gather*}

as an estimate of the number of iterations needed to achieve a fixed accuracy.

This estimate fails for very large κ, however.

8.6.5Exercises

Footnotes
  1. In principle, the implementation of Lanczos iteration is minor change from Arnoldi, but numerical stability requires some extra analysis and effort. We do not present the details.