Skip to article frontmatterSkip to article content

Conditioning of linear systems

We are ready to consider the conditioning of solving the square linear system Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}. In this problem, the data are A\mathbf{A} and b\mathbf{b}, and the solution is x\mathbf{x}. Both data and result are multidimensional, so we will use norms to measure their magnitudes.

The motivation for the definition of relative condition number in Chapter 1 was to quantify the response of the result to perturbations of the data. For simplicity, we start by allowing perturbations to b\mathbf{b} only while A\mathbf{A} remains fixed.

Let Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} be perturbed to

A(x+h)=b+d. \mathbf{A}(\mathbf{x}+\mathbf{h}) = \mathbf{b}+\mathbf{d}.

The condition number should be the relative change in the solution divided by relative change in the data,

hxdb=h  bd  x. \frac{\quad\dfrac{\| \mathbf{h} \| }{\| \mathbf{x} \| }\quad}{\dfrac{\| \mathbf{d} \| }{\| \mathbf{b} \|}} = \frac{\| \mathbf{h} \|\;\| \mathbf{b} \| }{\| \mathbf{d} \|\; \| \mathbf{x} \| }.

We can bound h\| \mathbf{h} \| in terms of d\| \mathbf{d} \|:

Ax+Ah=b+d,Ah=d,h=A1d,hA1d,\begin{split} \mathbf{A}\mathbf{x} + \mathbf{A} \mathbf{h} &= \mathbf{b} + \mathbf{d}, \\ \mathbf{A} \mathbf{h} &= \mathbf{d},\\ \mathbf{h} &= \mathbf{A}^{-1} \mathbf{d},\\ \| \mathbf{h} \| &\le \| \mathbf{A}^{-1}\| \,\| \mathbf{d} \|, \end{split}

where we have applied Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} and (2.7.8). Since also b=Ax\mathbf{b}=\mathbf{A}\mathbf{x} implies bAx\| \mathbf{b} \|\le \| \mathbf{A} \|\, \| \mathbf{x} \|, we derive

h  bd  x(A1d)(Ax)dx=A1A. \frac{\| \mathbf{h} \|\; \| \mathbf{b} \|}{\| \mathbf{d} \|\; \| \mathbf{x} \|} \le \frac{\bigl(\| \mathbf{A}^{-1} \|\, \| \mathbf{d} \|\bigr) \bigl(\| \mathbf{A} \|\,\| \mathbf{x} \|\bigr)}{\| \mathbf{d} \|\,\| \mathbf{x} \|} = \| \mathbf{A}^{-1}\| \, \| \mathbf{A} \|.

It is possible to show that this bound is tight, in the sense that the inequalities are in fact equalities for some choices of b\mathbf{b} and d\mathbf{d}. This result motivates a new definition.

2.8.1Main result

The matrix condition number (2.8.5) is equal to the condition number of solving a linear system of equations. Although we derived this fact above only for perturbations of b\mathbf{b}, a similar statement holds when A\mathbf{A} is perturbed.

Using a traditional Δ notation for the perturbation in a quantity, we can write the following.

Note that for any induced matrix norm,

1=I=AA1AA1=κ(A). 1 = \| \mathbf{I} \| = \| \mathbf{A} \mathbf{A}^{-1} \| \le \| \mathbf{A} \|\, \| \mathbf{A}^{-1} \| = \kappa(\mathbf{A}).

A condition number of 1 is the best we can hope for—in that case, the relative perturbation of the solution has the same size as that of the data. A condition number of size 10t10^t indicates that in floating-point arithmetic, roughly tt digits are lost (i.e., become incorrect) in computing the solution x\mathbf{x}. And if κ(A)>ϵmach1\kappa(\mathbf{A}) > \epsilon_\text{mach}^{-1}, then for computational purposes the matrix is effectively singular.

2.8.2Residual and backward error

Suppose that Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} and x~\tilde{\mathbf{x}} is a computed estimate of the solution x\mathbf{x}. The most natural quantity to study is the error, xx~\mathbf{x}-\tilde{\mathbf{x}}. Normally we can’t compute it because we don’t know the exact solution. However, we can compute something related.

Obviously, a zero residual means that x~=x\tilde{\mathbf{x}}=\mathbf{x}, and we have the exact solution. What happens more generally? Note that Ax~=br\mathbf{A}\tilde{\mathbf{x}}=\mathbf{b}-\mathbf{r}. That is, x~\tilde{\mathbf{x}} solves the linear system problem for a right-hand side that is changed by r-\mathbf{r}. This is precisely what is meant by backward error.

Hence residual and backward error are the same thing for a linear system. What is the connection to the (forward) error? We can reconnect with (2.8.6) by the definition h=x~x\mathbf{h} = \tilde{\mathbf{x}}-\mathbf{x}, in which case

d=A(x+h)b=Ah=r.\mathbf{d} = \mathbf{A}(\mathbf{x}+\mathbf{h})-\mathbf{b}=\mathbf{A}\mathbf{h} = -\mathbf{r}.

Thus, (2.8.6) is equivalent to

xx~xκ(A)rb. \frac{\| \mathbf{x}-\tilde{\mathbf{x}} \|}{\| \mathbf{x} \|} \le \kappa(\mathbf{A}) \frac{\| \mathbf{r} \|}{\| \mathbf{b} \|}.

Equation (2.8.11) says that the gap between relative error and the relative residual is a multiplication by the matrix condition number.

2.8.3Exercises