Matrix-free iterations - Fundamentals of Numerical Computation

In Chapter 4, we solved the nonlinear rootfinding problem $\mathbf{f}(\mathbf{x})=\boldsymbol{0}$ with methods that needed only the ability to evaluate $\mathbf{f}$ at any known value of $\mathbf{x}$ . By repeatedly evaluating $\mathbf{f}$ at cleverly chosen points, these algorithms were able to return an estimate for $\mathbf{f}^{-1}(\boldsymbol{0})$ .

We can explore the same idea in the context of linear algebra by shifting our viewpoint from matrices to linear transformations. If we define $\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}$ , then for all vectors $\mathbf{x}$ , $\mathbf{y}$ , and scalars $\alpha$ ,

\begin{split} \mathbf{f}(\mathbf{x} + \mathbf{y} ) &= \mathbf{f}(\mathbf{x}) + \mathbf{f}(\mathbf{y} ), \\ \mathbf{f}(\alpha \mathbf{x} ) & = \alpha\, \mathbf{f}(\mathbf{x}). \end{split}

(8.7.1)

These properties define a linear transformation. Moreover, every linear transformation between finite-dimensional vector spaces can be represented as a matrix-vector multiplication.

A close examination reveals that in the power iteration and Krylov subspace methods, the only appearance of the matrix $\mathbf{A}$ is to apply it a known vector, i.e., to evaluate the linear transformation $\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}$ . If we have access to $\mathbf{f}$ , we don’t need the matrix at all! That is, Krylov subspace methods can be used to invert a linear transformation if one provides code for the transformation, even if its associated matrix is not known explicitly. That may sound like a strange situation, but it is not uncommon.

8.7.1Blurring images¶

In From matrix to insight we saw that a grayscale image can be represented as an $m\times n$ matrix $\mathbf{X}$ of pixel intensity values. Now consider a simple model for blurring the image. Define $\mathbf{B}$ as the $m\times m$ tridiagonal matrix

B_{ij} = \begin{cases} \tfrac{1}{2} & \text{if $i=j$},\\ \tfrac{1}{4} & \text{if $|i-j|=1$},\\ 0 & \text{otherwise.} \end{cases}

(8.7.2)

The product $\mathbf{B}\mathbf{X}$ applies $\mathbf{B}$ to each column of $\mathbf{X}$ . Within that column it does a weighted average of the values of each pixel and its two neighbors. That has the effect of blurring the image vertically. We can increase the amount of blur by applying $\mathbf{B}$ repeatedly.

In order to blur horizontally, we can transpose the image and apply blurring in the same way. We need a blurring matrix defined as in (8.7.2) but with size $n\times n$ . We call this matrix $\mathbf{C}$ . Altogether the horizontal blurring is done by transposing, applying $\mathbf{C}$ , and transposing back to the original orientation. That is,

\bigl(\mathbf{C} \mathbf{X}^T\bigr)^T = \mathbf{X}\mathbf{C}^T = \mathbf{X}\mathbf{C},

(8.7.3)

using the symmetry of $\mathbf{C}$ . So we can describe blur in both directions as the function

\operatorname{blur}(\mathbf{X}) = \mathbf{B}^k \mathbf{X} \mathbf{C}^k

(8.7.4)

for a positive integer $k$ .

Example 8.7.1 (Blurring an image)

Julia

MATLAB

Python

Example 8.7.1

We use a readily available test image.

using Images, TestImages
img = testimage("mandrill")
m, n = size(img)
X = @. Float64(Gray(img))
plot(Gray.(X), title="Original image", aspect_ratio=1)

We define the one-dimensional tridiagonal blurring matrices.

using SparseArrays
function blurmatrix(d)
    v1 = fill(0.25, d - 1)
    return spdiagm(0 => fill(0.5, d), 1 => v1, -1 => v1)
end
B, C = blurmatrix(m), blurmatrix(n);

Finally, we show the results of using $k=12$ repetitions of the blur in each direction.

using Plots
blur = X -> B^12 * X * C^12;
Z = blur(X)
plot(Gray.(Z), title="Blurred image")

Example 8.7.1

We use a readily available test image.

load mandrill
[m, n] = size(X);
clf
imshow(X, [0, 255])
title('Original image')    % ignore this

We define the one-dimensional tridiagonal blurring matrices.

v = [1/4, 1/2, 1/4];
B = spdiags(v, -1:1, m, m);
C = spdiags(v, -1:1, n, n);

Finally, we show the results of using $k=12$ repetitions of the blur in each direction.

blur = @(X) B^12 * X * C^12;
imshow(blur(X), [0, 255])
title(('Blurred image'));

Example 8.7.1

We use a readily available test image.

from skimage import data as testimages
from skimage.color import rgb2gray
img = getattr(testimages, "coffee")()
X = rgb2gray(img)
imshow(X, cmap="gray");

We define the one-dimensional tridiagonal blurring matrices.

import scipy.sparse as sp
def blurmatrix(d):
    data = [[0.25] * (d-1), [0.5] * d, [0.25] * (d-1)]
    return sp.diags(data, [-1, 0, 1], shape=(d, d))

m, n = X.shape
B = blurmatrix(m)
C = blurmatrix(n)

Finally, we show the results of using $k=12$ repetitions of the blur in each direction.

from scipy.sparse.linalg import matrix_power
blur = lambda X: matrix_power(B, 12) @ X @ matrix_power(C, 12)

imshow(blur(X), cmap="gray")
title("Blurred image");

8.7.2Deblurring¶

A more interesting operation is deblurring: given an image blurred by poor focus, can we reconstruct the true image? Conceptually, we want to invert the function $\operatorname{blur}(\mathbf{X})$ .

It’s easy to see from (8.7.4) that the blur operation is a linear transformation on image matrices. But an $m\times n$ image matrix is equivalent to a length- $mn$ vector—it’s just a matter of interpreting the shape of the same data. Let $\operatorname{vec}(\mathbf{X})=\mathbf{x}$ and $\operatorname{unvec}(\mathbf{x})=\mathbf{X}$ be the mathematical statements of such reshaping operations. Now say $\mathbf{X}$ is the original image and $\mathbf{Z}=\operatorname{blur}(\mathbf{X})$ is the blurred one. Then by linearity there is some matrix $\mathbf{A}$ such that

\mathbf{A} \operatorname{vec}(\mathbf{X}) = \operatorname{vec}(\mathbf{Z}),

(8.7.5)

or $\mathbf{A}\mathbf{x}=\mathbf{z}$ .

The matrix $\mathbf{A}$ is $mn\times mn$ ; for a 12-megapixel image, it would have $1.4\times 10^{14}$ entries! Admittedly, it is extremely sparse, but the point is that we don’t need it at all.

Instead, given any vector $\mathbf{u}$ we can compute $\mathbf{v}=\mathbf{A}\mathbf{u}$ through the steps

\begin{align*} \mathbf{U} &= \operatorname{unvec}(\mathbf{u}), \\ \mathbf{V} &= \operatorname{blur}(\mathbf{U}), \\ \mathbf{v} &= \operatorname{vec}(\mathbf{V}). \end{align*}

(8.7.6)

The following example shows how to put these ideas into practice with MINRES.

Example 8.7.2 (Deblurring an image)

Julia

MATLAB

Python

Example 8.7.2

We repeat the earlier process to blur an original image $\mathbf{X}$ to get $\mathbf{Z}$ .

Source

img = testimage("lighthouse")
m, n = size(img)
X = @. Float64(Gray(img))

B = spdiagm(0 => fill(0.5, m),
    1 => fill(0.25, m - 1), -1 => fill(0.25, m - 1))
C = spdiagm(0 => fill(0.5, n),
    1 => fill(0.25, n - 1), -1 => fill(0.25, n - 1))
blur = X -> B^12 * X * C^12
Z = blur(X)
plot(Gray.(Z), title="Blurred image")

Now we imagine that $\mathbf{X}$ is unknown and that we want to recover it from $\mathbf{Z}$ . We first need functions that translate between vector and matrix representations.

# vec (built-in) converts matrix to vector
unvec = z -> reshape(z, m, n);  # convert vector to matrix

Now we declare the three-step blur transformation as a LinearMap, supplying also the size of the vector form of an image.

using LinearMaps
T = LinearMap(x -> vec(blur(unvec(x))), m * n);

The blurring operators are symmetric, so we apply minres to the composite blurring transformation T.

using IterativeSolvers
y = minres(T, vec(Z), maxiter=50, reltol=1e-5);
Y = unvec(clamp01.(y))

plot(Gray.(X), layout=2, title="Original")
plot!(Gray.(Y), subplot=2, title="Deblurred")

Example 8.7.2

We repeat the earlier process to blur an original image $\mathbf{X}$ to get $\mathbf{Z}$ .

Source

load mandrill
[m, n] = size(X);
v = [1/4, 1/2, 1/4];
B = spdiags(v, -1:1, m, m);
C = spdiags(v, -1:1, n, n);
blur = @(X) B^12 * X * C^12;

Z = blur(X);
clf,  imshow(Z, [0, 255])
title(("Blurred image"));

Now we imagine that $\mathbf{X}$ is unknown and that we want to recover it from $\mathbf{Z}$ . We first need functions that translate between vector and matrix representations.

vec = @(X) reshape(X,m*n,1);
unvec = @(x) reshape(x,m,n);
T = @(x) vec( blur(unvec(x)) );

The blurring operators are symmetric, so we apply minres to the composite blurring transformation T.

y = gmres(T, vec(Z), 50, 1e-5);
Y = unvec(y);

subplot(121)
imshow(X, [0, 255])
title("Original")
subplot(122)
imshow(Y, [0, 255])
title(("Deblurred"));

gmres(50) converged at outer iteration 2 (inner iteration 45) to a solution with relative residual 1e-05.

Example 8.7.2

We repeat the earlier process to blur an original image $\mathbf{X}$ to get $\mathbf{Z}$ .

Notebook Cell

img = getattr(testimages, "coffee")()
X = rgb2gray(img)
m, n = X.shape

import scipy.sparse as sp
def blurmatrix(d):
    data = [[0.25] * (d-1), [0.5] * d, [0.25] * (d-1)]
    return sp.diags(data, [-1, 0, 1], shape=(d, d))
B = blurmatrix(m)
C = blurmatrix(n)

from scipy.sparse.linalg import matrix_power
blur = lambda X: matrix_power(B, 12) @ X @ matrix_power(C, 12)

Z = blur(X)
imshow(Z, cmap="gray")
title("Blurred image");

Now we imagine that $\mathbf{X}$ is unknown and that we want to recover it from $\mathbf{Z}$ . We first need functions that translate between vector and matrix representations.

from scipy.sparse.linalg import LinearOperator
vec = lambda Z: Z.reshape(m * n)
unvec = lambda z: z.reshape(m, n)
xform = lambda x: vec(blur(unvec(x)))

Now we declare the three-step blur transformation as a LinearOperator, supplying also the size of the vector form of an image.

T = LinearOperator((m * n, m * n), matvec=xform)

The blurring operators are symmetric, so we apply minres to the composite blurring transformation T.

from scipy.sparse.linalg import gmres
y, flag = gmres(T, vec(Z), rtol=1e-5, maxiter=50)
Y = unvec(maximum(0, minimum(1, y)))


subplot(1, 2, 1),  imshow(X, cmap="gray")
axis("off"),  title("Original")
subplot(1, 2, 2),  imshow(Y, cmap="gray")
axis("off"),  title("Deblurred");

8.7.3Exercises¶

Exercise 8.7.2

✍ In each case, state with reasons whether the given transformation on $n$ -vectors is linear.

(a) $\,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_2\\x_3 \\\vdots\\ x_n \\ x_1 \end{bmatrix}\qquad$ (b) $\,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1\\x_1+x_2\\x_1+x_2+x_3\\\vdots\\x_1+\cdots+x_n \end{bmatrix} \qquad$ (c) $\,\mathbf{f}(\mathbf{x}) = \begin{bmatrix} x_1 + 1 \\x_2 + 2 \\ x_3 + 3 \\\vdots \\ x_n+n \end{bmatrix} \qquad$ (d) $\,\mathbf{f}(\mathbf{x}) = \|\mathbf{x}\|_\infty\, \mathbf{e}_1$

Preface

MINRES and conjugate gradients

Preface

Preconditioning