Matrix-free iterations Tobin A. Driscoll Richard J. Braun
In Chapter 4, we solved the nonlinear rootfinding problem f ( x ) = 0 \mathbf{f}(\mathbf{x})=\boldsymbol{0} f ( x ) = 0 with methods that needed only the ability to evaluate f \mathbf{f} f at any known value of x \mathbf{x} x . By repeatedly evaluating f \mathbf{f} f at cleverly chosen points, these algorithms were able to return an estimate for f − 1 ( 0 ) \mathbf{f}^{-1}(\boldsymbol{0}) f − 1 ( 0 ) .
We can explore the same idea in the context of linear algebra by shifting our viewpoint from matrices to linear transformations . If we define f ( x ) = A x \mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x} f ( x ) = Ax , then for all vectors x \mathbf{x} x , y \mathbf{y} y , and scalars α,
f ( x + y ) = f ( x ) + f ( y ) , f ( α x ) = α f ( x ) . \begin{split}
\mathbf{f}(\mathbf{x} + \mathbf{y} ) &= \mathbf{f}(\mathbf{x}) + \mathbf{f}(\mathbf{y} ), \\
\mathbf{f}(\alpha \mathbf{x} ) & = \alpha\, \mathbf{f}(\mathbf{x}).
\end{split} f ( x + y ) f ( α x ) = f ( x ) + f ( y ) , = α f ( x ) . These properties define a linear transformation. Moreover, every linear transformation between finite-dimensional vector spaces can be represented as a matrix-vector multiplication.
A close examination reveals that in the power iteration and Krylov subspace methods, the only appearance of the matrix A \mathbf{A} A is to apply it a known vector, i.e., to evaluate the linear transformation f ( x ) = A x \mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x} f ( x ) = Ax . If we have access to f \mathbf{f} f , we don’t need the matrix at all! That is, Krylov subspace methods can be used to invert a linear transformation if one provides code for the transformation, even if its associated matrix is not known explicitly. That may sound like a strange situation, but it is not uncommon.
8.7.1 Blurring images ¶ In From matrix to insight we saw that a grayscale image can be represented as an m × n m\times n m × n matrix X \mathbf{X} X of pixel intensity values. Now consider a simple model for blurring the image. Define B \mathbf{B} B as the m × m m\times m m × m tridiagonal matrix
B i j = { 1 2 if i = j , 1 4 if ∣ i − j ∣ = 1 , 0 otherwise. B_{ij} =
\begin{cases}
\tfrac{1}{2} & \text{if $i=j$},\\
\tfrac{1}{4} & \text{if $|i-j|=1$},\\
0 & \text{otherwise.}
\end{cases} B ij = ⎩ ⎨ ⎧ 2 1 4 1 0 if i = j , if ∣ i − j ∣ = 1 , otherwise. The product B X \mathbf{B}\mathbf{X} BX applies B \mathbf{B} B to each column of X \mathbf{X} X . Within that column it does a weighted average of the values of each pixel and its two neighbors. That has the effect of blurring the image vertically. We can increase the amount of blur by applying B \mathbf{B} B repeatedly.
In order to blur horizontally, we can transpose the image and apply blurring in the same way. We need a blurring matrix defined as in (8.7.2) but with size n × n n\times n n × n . We call this matrix C \mathbf{C} C . Altogether the horizontal blurring is done by transposing, applying C \mathbf{C} C , and transposing back to the original orientation. That is,
( C X T ) T = X C T = X C , \bigl(\mathbf{C} \mathbf{X}^T\bigr)^T = \mathbf{X}\mathbf{C}^T = \mathbf{X}\mathbf{C}, ( C X T ) T = X C T = XC , using the symmetry of C \mathbf{C} C . So we can describe blur in both directions as the function
blur ( X ) = B k X C k \operatorname{blur}(\mathbf{X}) = \mathbf{B}^k \mathbf{X} \mathbf{C}^k blur ( X ) = B k X C k for a positive integer k k k .
Example 8.7.1 (Blurring an image)
Example 8.7.1 We use a readily available test image.
using Images, TestImages
img = testimage("mandrill")
m, n = size(img)
X = @. Float64(Gray(img))
plot(Gray.(X), title="Original image", aspect_ratio=1)
We define the one-dimensional tridiagonal blurring matrices.
using SparseArrays
function blurmatrix(d)
v1 = fill(0.25, d - 1)
return spdiagm(0 => fill(0.5, d), 1 => v1, -1 => v1)
end
B, C = blurmatrix(m), blurmatrix(n);
Finally, we show the results of using k = 12 k=12 k = 12 repetitions of the blur in each direction.
using Plots
blur = X -> B^12 * X * C^12;
Z = blur(X)
plot(Gray.(Z), title="Blurred image")
Example 8.7.1 We use a readily available test image.
load mandrill
[m, n] = size(X);
clf
imshow(X, [0, 255])
title('Original image') % ignore this
We define the one-dimensional tridiagonal blurring matrices.
v = [1/4, 1/2, 1/4];
B = spdiags(v, -1:1, m, m);
C = spdiags(v, -1:1, n, n);
Finally, we show the results of using k = 12 k=12 k = 12 repetitions of the blur in each direction.
blur = @(X) B^12 * X * C^12;
imshow(blur(X), [0, 255])
title(('Blurred image'));
Example 8.7.1 We use a readily available test image.
from skimage import data as testimages
from skimage.color import rgb2gray
img = getattr(testimages, "coffee")()
X = rgb2gray(img)
imshow(X, cmap="gray");
We define the one-dimensional tridiagonal blurring matrices.
import scipy.sparse as sp
def blurmatrix(d):
data = [[0.25] * (d-1), [0.5] * d, [0.25] * (d-1)]
return sp.diags(data, [-1, 0, 1], shape=(d, d))
m, n = X.shape
B = blurmatrix(m)
C = blurmatrix(n)
Finally, we show the results of using k = 12 k=12 k = 12 repetitions of the blur in each direction.
from scipy.sparse.linalg import matrix_power
blur = lambda X: matrix_power(B, 12) @ X @ matrix_power(C, 12)
imshow(blur(X), cmap="gray")
title("Blurred image");
8.7.2 Deblurring ¶ A more interesting operation is deblurring : given an image blurred by poor focus, can we reconstruct the true image? Conceptually, we want to invert the function blur ( X ) \operatorname{blur}(\mathbf{X}) blur ( X ) .
It’s easy to see from (8.7.4) that the blur operation is a linear transformation on image matrices. But an m × n m\times n m × n image matrix is equivalent to a length-m n mn mn vector—it’s just a matter of interpreting the shape of the same data. Let vec ( X ) = x \operatorname{vec}(\mathbf{X})=\mathbf{x} vec ( X ) = x and unvec ( x ) = X \operatorname{unvec}(\mathbf{x})=\mathbf{X} unvec ( x ) = X be the mathematical statements of such reshaping operations. Now say X \mathbf{X} X is the original image and Z = blur ( X ) \mathbf{Z}=\operatorname{blur}(\mathbf{X}) Z = blur ( X ) is the blurred one. Then by linearity there is some matrix A \mathbf{A} A such that
A vec ( X ) = vec ( Z ) , \mathbf{A} \operatorname{vec}(\mathbf{X}) = \operatorname{vec}(\mathbf{Z}), A vec ( X ) = vec ( Z ) , or A x = z \mathbf{A}\mathbf{x}=\mathbf{z} Ax = z .
The matrix A \mathbf{A} A is m n × m n mn\times mn mn × mn ; for a 12-megapixel image, it would have 1.4 × 1 0 14 1.4\times 10^{14} 1.4 × 1 0 14 entries! Admittedly, it is extremely sparse, but the point is that we don’t need it at all.
Instead, given any vector u \mathbf{u} u we can compute v = A u \mathbf{v}=\mathbf{A}\mathbf{u} v = Au through the steps
U = unvec ( u ) , V = blur ( U ) , v = vec ( V ) . \begin{align*}
\mathbf{U} &= \operatorname{unvec}(\mathbf{u}), \\
\mathbf{V} &= \operatorname{blur}(\mathbf{U}), \\
\mathbf{v} &= \operatorname{vec}(\mathbf{V}).
\end{align*} U V v = unvec ( u ) , = blur ( U ) , = vec ( V ) . The following example shows how to put these ideas into practice with MINRES.
Example 8.7.2 (Deblurring an image)
Example 8.7.2 We repeat the earlier process to blur an original image X \mathbf{X} X to get Z \mathbf{Z} Z .
Sourceimg = testimage("lighthouse")
m, n = size(img)
X = @. Float64(Gray(img))
B = spdiagm(0 => fill(0.5, m),
1 => fill(0.25, m - 1), -1 => fill(0.25, m - 1))
C = spdiagm(0 => fill(0.5, n),
1 => fill(0.25, n - 1), -1 => fill(0.25, n - 1))
blur = X -> B^12 * X * C^12
Z = blur(X)
plot(Gray.(Z), title="Blurred image")
Now we imagine that X \mathbf{X} X is unknown and that we want to recover it from Z \mathbf{Z} Z . We first need functions that translate between vector and matrix representations.
# vec (built-in) converts matrix to vector
unvec = z -> reshape(z, m, n); # convert vector to matrix
Now we declare the three-step blur transformation as a LinearMap
, supplying also the size of the vector form of an image.
using LinearMaps
T = LinearMap(x -> vec(blur(unvec(x))), m * n);
The blurring operators are symmetric, so we apply minres
to the composite blurring transformation T
.
Tip
The function clamp01
in Images
restricts values to be in the interval [ 0 , 1 ] [0,1] [ 0 , 1 ] .
using IterativeSolvers
y = minres(T, vec(Z), maxiter=50, reltol=1e-5);
Y = unvec(clamp01.(y))
plot(Gray.(X), layout=2, title="Original")
plot!(Gray.(Y), subplot=2, title="Deblurred")
Example 8.7.2 We repeat the earlier process to blur an original image X \mathbf{X} X to get Z \mathbf{Z} Z .
Sourceload mandrill
[m, n] = size(X);
v = [1/4, 1/2, 1/4];
B = spdiags(v, -1:1, m, m);
C = spdiags(v, -1:1, n, n);
blur = @(X) B^12 * X * C^12;
Z = blur(X);
clf, imshow(Z, [0, 255])
title(("Blurred image"));
Now we imagine that X \mathbf{X} X is unknown and that we want to recover it from Z \mathbf{Z} Z . We first need functions that translate between vector and matrix representations.
vec = @(X) reshape(X,m*n,1);
unvec = @(x) reshape(x,m,n);
T = @(x) vec( blur(unvec(x)) );
The blurring operators are symmetric, so we apply minres
to the composite blurring transformation T
.
y = gmres(T, vec(Z), 50, 1e-5);
Y = unvec(y);
subplot(121)
imshow(X, [0, 255])
title("Original")
subplot(122)
imshow(Y, [0, 255])
title(("Deblurred"));
gmres(50) converged at outer iteration 2 (inner iteration 45) to a solution with relative residual 1e-05.
Example 8.7.2 We repeat the earlier process to blur an original image X \mathbf{X} X to get Z \mathbf{Z} Z .
Notebook Cellimg = getattr(testimages, "coffee")()
X = rgb2gray(img)
m, n = X.shape
import scipy.sparse as sp
def blurmatrix(d):
data = [[0.25] * (d-1), [0.5] * d, [0.25] * (d-1)]
return sp.diags(data, [-1, 0, 1], shape=(d, d))
B = blurmatrix(m)
C = blurmatrix(n)
from scipy.sparse.linalg import matrix_power
blur = lambda X: matrix_power(B, 12) @ X @ matrix_power(C, 12)
Z = blur(X)
imshow(Z, cmap="gray")
title("Blurred image");
Now we imagine that X \mathbf{X} X is unknown and that we want to recover it from Z \mathbf{Z} Z . We first need functions that translate between vector and matrix representations.
from scipy.sparse.linalg import LinearOperator
vec = lambda Z: Z.reshape(m * n)
unvec = lambda z: z.reshape(m, n)
xform = lambda x: vec(blur(unvec(x)))
Now we declare the three-step blur transformation as a LinearOperator
, supplying also the size of the vector form of an image.
T = LinearOperator((m * n, m * n), matvec=xform)
The blurring operators are symmetric, so we apply minres
to the composite blurring transformation T
.
from scipy.sparse.linalg import gmres
y, flag = gmres(T, vec(Z), rtol=1e-5, maxiter=50)
Y = unvec(maximum(0, minimum(1, y)))
subplot(1, 2, 1), imshow(X, cmap="gray")
axis("off"), title("Original")
subplot(1, 2, 2), imshow(Y, cmap="gray")
axis("off"), title("Deblurred");
8.7.3 Exercises ¶ ✍ In each case, state with reasons whether the given transformation on n n n -vectors is linear.
(a) f ( x ) = [ x 2 x 3 ⋮ x n x 1 ] \,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_2\\x_3 \\\vdots\\ x_n \\ x_1 \end{bmatrix}\qquad f ( x ) = ⎣ ⎡ x 2 x 3 ⋮ x n x 1 ⎦ ⎤
(b) f ( x ) = [ x 1 x 1 + x 2 x 1 + x 2 + x 3 ⋮ x 1 + ⋯ + x n ] \,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1\\x_1+x_2\\x_1+x_2+x_3\\\vdots\\x_1+\cdots+x_n \end{bmatrix} \qquad f ( x ) = ⎣ ⎡ x 1 x 1 + x 2 x 1 + x 2 + x 3 ⋮ x 1 + ⋯ + x n ⎦ ⎤
(c) f ( x ) = [ x 1 + 1 x 2 + 2 x 3 + 3 ⋮ x n + n ] \,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1 + 1 \\x_2 + 2 \\ x_3 + 3 \\\vdots \\ x_n+n \end{bmatrix} \qquad f ( x ) = ⎣ ⎡ x 1 + 1 x 2 + 2 x 3 + 3 ⋮ x n + n ⎦ ⎤
(d) f ( x ) = ∥ x ∥ ∞ e 1 \,\mathbf{f}(\mathbf{x}) = \|\mathbf{x}\|_\infty\, \mathbf{e}_1 f ( x ) = ∥ x ∥ ∞ e 1
The condition number of the matrix of the blur transformation is related to the condition numbers of the single-dimension matrices B k \mathbf{B}^k B k and C k \mathbf{C}^k C k in (8.7.4) .
(a) ⌨ Let m = 50 m=50 m = 50 . Show that B \mathbf{B} B has a Cholesky factorization and thus is SPD . Find κ ( B ) \kappa(\mathbf{B}) κ ( B ) . (Note: cond
requires a regular dense matrix, not a sparse matrix.)
(b) ✍ Explain why part (a) implies κ ( B k ) = κ ( B ) k \kappa( \mathbf{B}^k ) = \kappa(\mathbf{B})^k κ ( B k ) = κ ( B ) k .
(c) ✍ Explain two important effects of the limit k → ∞ k\to \infty k → ∞ on deblurring by Krylov methods.
The cumulative summation function cumsum
is defined as
f ( x ) = [ x 1 x 1 + x 2 ⋮ x 1 + x 2 + ⋯ + x n ] . \mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1 \\ x_1+x_2 \\ \vdots \\ x_1 + x_2 + \cdots + x_n \end{bmatrix}. f ( x ) = ⎣ ⎡ x 1 x 1 + x 2 ⋮ x 1 + x 2 + ⋯ + x n ⎦ ⎤ . (a) ✍ Show that f \mathbf{f} f is a linear transformation.
(b) ⌨ Define vector b \mathbf{b} b by b i = ( i / 100 ) 2 b_i = (i/100)^2 b i = ( i /100 ) 2 for i = 1 , … , 100 i=1,\ldots,100 i = 1 , … , 100 . Then use gmres
to find x = f − 1 ( b ) \mathbf{x}=\mathbf{f}^{-1}(\mathbf{b}) x = f − 1 ( b ) .
(c) ⌨ Plot x \mathbf{x} x , and explain why the result looks as it does.