Fixed-point iteration - Fundamentals of Numerical Computation

In this section, we consider the alternative form of the rootfinding problem known as the fixed-point problem.

Given $f$ for rootfinding, we could define $g(x)=x-f(x)$ , and then $f(r)=0$ implies $g(r)=r$ and vice versa. There are infinitely many ways to make this transformation, such as $g(x)=x+cf(x)$ for any constant $c$ . The process can be reversed, too. Given $g(x)$ , we could define $f(x)=x-g(x)$ , and then $g(p)=p$ implies $f(p)=0$ .

There is an extraordinarily simple way to try to find a fixed point of any given $g(x)$ .

This is our first example of an iterative algorithm that never quite gets to the answer, even if we use exact numbers. The idea is to generate a sequence of values that one hopes will converge to the correct result, and stop when we are satisfied that we are close enough to the limit.

Example 4.2.1 (Fixed-point iteration)

Julia

MATLAB

Python

Let’s convert the roots of a quadratic polynomial $f(x)$ to a fixed point problem.

using Polynomials
p = Polynomial([3.5, -4,1])
r = roots(p)
rmin, rmax = extrema(r)
@show rmin, rmax;

(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)

We define $g(x)=x-p(x)$ .

g(x) = x - p(x)

g (generic function with 1 method)

Intersections of $y=g(x)$ with the line $y=x$ are fixed points of $g$ and thus roots of $f$ . (Only one is shown in the chosen plot range.)

using Plots
plt = plot([g x->x], 2, 3;
    l=2, label=[L"y=g(x)" L"y=x"],
    xlabel=L"x",  ylabel=L"y", 
    aspect_ratio=1,
    title="Finding a fixed point",  legend=:bottomright)

If we evaluate $g(2.1)$ , we get a value of almost 2.6, so this is not a fixed point.

x = 2.1;
y = g(x)

2.59

However, $y=g(x)$ is considerably closer to the fixed point at around 2.7 than $x$ is. Suppose then that we adopt $y$ as our new $x$ value. Changing the $x$ coordinate in this way is the same as following a horizontal line over to the graph of $y=x$ .

plot!([x, y], [y, y], arrow=true, color=3)

Now we can compute a new value for $y$ . We leave $x$ alone here, so we travel along a vertical line to the graph of $g$ .

x = y;  y = g(x)
plot!([x, x], [x, y], arrow=true, color=4)

You see that we are in a position to repeat these steps as often as we like. Let’s apply them a few times and see the result.

for k = 1:5
    plot!([x, y], [y, y], color=3);  
    x = y       # y becomes the new x
    y = g(x)    # g(x) becomes the new y
    plot!([x, x], [x, y], color=4)  
end
plt

The process spirals in beautifully toward the fixed point we seek. Our last estimate has almost 4 accurate digits.

abs(y - rmax) / rmax

0.0001653094344995643

Now let’s try to find the other fixed point $\approx 1.29$ in the same way. We’ll use 1.3 as a starting approximation.

plt = plot([g x->x], 1, 2, l=2, label=["y=g(x)" "y=x"], aspect_ratio=1, 
    xlabel=L"x", ylabel=L"y", title="Divergence", legend=:bottomright)

x = 1.3; y = g(x);
arrow = false
for k = 1:5
    plot!([x, y], [y, y], arrow=arrow, color=3)  
    x = y       # y --> new x
    y = g(x)    # g(x) --> new y
    plot!([x, x], [x, y], arrow=arrow, color=4)
    if k > 2; arrow = true; end
end
plt

This time, the iteration is pushing us away from the correct answer.

Let’s convert the roots of a quadratic polynomial $f(x)$ to a fixed point problem.

f = @(x) x.^2 - 4*x + 3.5;
r = roots([1, -4, 3.5])

We define $g(x)=x-p(x)$ .

g = @(x) x - f(x);

Intersections of $y=g(x)$ with the line $y=x$ are fixed points of $g$ and thus roots of $f$ . (Only one is shown in the chosen plot range.)

clf
fplot(g, [2, 3])
hold on,  plot([2, 3], [2, 3], 'k')
title('Finding a fixed point'),  axis equal  
xlabel('x'),  ylabel('y')

If we evaluate $g(2.1)$ , we get a value of almost 2.6, so this is not a fixed point.

x = 2.1;
y = g(x)

plot([x, y], [y, y], '-')
x = y;

Now we can compute a new value for $y$ . We leave $x$ alone here, so we travel along a vertical line to the graph of $g$ .

y = g(x)
plot([x, x],[x, y], '-')

You see that we are in a position to repeat these steps as often as we like. Let’s apply them a few times and see the result.

for k = 1:5
    plot([x, y], [y, y], '-')
    x = y;       % y --> new x
    y = g(x);    % g(x) --> new y
    plot([x, x], [x, y], '-')  
end

The process spirals in beautifully toward the fixed point we seek. Our last estimate has almost 4 accurate digits.

abs(y - r(1)) / r(1)

Now let’s try to find the other fixed point $\approx 1.29$ in the same way. We’ll use 1.3 as a starting approximation.

cla
fplot(g, [1, 2])
hold on, plot([1, 2], [1, 2], 'k')
ylim([1, 2])
x = 1.3;  y = g(x);
for k = 1:5
    plot([x, y], [y, y], '-'),  
    x = y;       % y --> new x
    y = g(x);    % g(x) --> new y
    plot([x, x], [x, y], '-') 
end
title('No convergence')

This time, the iteration is pushing us away from the correct answer.

Let’s convert the roots of a quadratic polynomial $f(x)$ to a fixed point problem.

f = poly1d([1, -4, 3.5])
r = f.roots
print(r)

[2.70710678 1.29289322]

We define $g(x)=x - f(x)$ .

g = lambda x: x - f(x)

Intersections of $y=g(x)$ with the line $y=x$ are fixed points of $g$ and thus roots of $f$ . (Only one is shown in the chosen plot range.)

fig, ax = subplots()
g = lambda x: x - f(x)
xx = linspace(2, 3, 400)
ax.plot(xx, g(xx), label="y=g(x)")
ax.plot(xx, xx, label="y=x")
axis("equal"), legend()
title("Finding a fixed point");

If we evaluate $g(2.1)$ , we get a value of almost 2.6, so this is not a fixed point.

x = 2.1
y = g(x)
print(y)

2.59

ax.plot([x, y], [y, y], "r:", label="")
fig

Now we can compute a new value for $y$ . We leave $x$ alone here, so we travel along a vertical line to the graph of $g$ .

x = y
y = g(x)
print("y:", y)
ax.plot([x, x], [x, y], "k:")
fig

y: 2.7419000000000002

You see that we are in a position to repeat these steps as often as we like. Let’s apply them a few times and see the result.

for k in range(5):
    ax.plot([x, y], [y, y], "r:")
    x = y       # y --> new x
    y = g(x)    # g(x) --> new y
    ax.plot([x, x], [x, y], "k:")  
fig

The process spirals in beautifully toward the fixed point we seek. Our last estimate has almost 4 accurate digits.

print(abs(y - max(r)) / max(r))

0.0001653094344995643

Now let’s try to find the other fixed point $\approx 1.29$ in the same way. We’ll use 1.3 as a starting approximation.

xx = linspace(1, 2, 400)
fig, ax = subplots()
ax.plot(xx, g(xx), label="y=g(x)")
ax.plot(xx, xx, label="y=x")
ax.set_aspect(1.0)
ax.legend()

x = 1.3
y = g(x)
for k in range(5):
    ax.plot([x, y], [y, y], "r:")
    x = y
    y = g(x)
    ax.plot([x, x], [x, y], "k:")
ylim(1, 2.5)
title("No convergence");

This time, the iteration is pushing us away from the correct answer.

4.2.1Series analysis¶

In Example 4.2.1, the two computed iterations differ only in the choice of $x_1$ . In the first case, we evidently generated a sequence that converged to one of the fixed points. In the second case, however, the generated sequence diverged.^[1] The easiest way to uncover the essential difference between the two cases is to use a Taylor series expansion.

Suppose a fixed point $p$ is the desired limit of an iteration $x_1,x_2,\ldots$ . It’s often easier to express quantities in terms of the error sequence $\epsilon_1,\epsilon_2,\ldots,$ where $\epsilon_k=x_k-p$ . Starting from (4.2.1), we have

\begin{split} \epsilon_{k+1}+p = g( \epsilon_{k}+p ) = g(p) + g'(p) \epsilon_k + \frac{1}{2}g''(p) \epsilon_k^2 + \cdots, \end{split}

(4.2.2)

assuming that $g$ has at least two continuous derivatives. But by definition, $g(p)=p$ , so

\epsilon_{k+1} = g'(p) \epsilon_k + O(\epsilon_k^2).

(4.2.3)

If the iteration is to converge to $p$ , the errors must approach zero. In this case we can neglect the second-order term and conclude that $\epsilon_{k+1} \approx g'(p) \epsilon_k$ . This is consistent with convergence if $|g'(p)|<1$ . However, if $|g'(p)| >1$ , we are led to the conclusion that the errors must grow, not vanish, even if they start quite small.

4.2.2Linear convergence¶

In computation, we usually want to know not just whether an iteration converges but also the rate at which convergence occurs, i.e., how quickly the errors approach zero. Other things being equal, faster convergence is preferred to slower convergence, as it usually implies that the computation will take less time to achieve a desired accuracy.

The prediction of the series analysis above is that if the fixed-point iteration converges, the errors approximately satisfy $|\epsilon_{k+1}| = \sigma|\epsilon_k|$ , for $\sigma = |g'(p)| < 1$ . This is a well-known type of convergence.

If we suppose that the ratios in (4.2.4) all equal $\sigma$ (i.e., perfect linear convergence), then $|\epsilon_k| = C \sigma^k$ . Taking logs, we get

\log |\epsilon_k| = k(\log \sigma) + (\log C).

(4.2.5)

This is in the form $\log |\epsilon_k| = \alpha k + \beta$ , which is a linear relationship.

Example 4.2.3 (Convergence of fixed-point iteration)

Julia

MATLAB

Python

We revisit Example 4.2.1 and investigate the observed convergence more closely. Recall that above we calculated $g'(p)\approx-0.42$ at the convergent fixed point.

p = Polynomial([3.5, -4, 1])
r = roots(p)
rmin, rmax = extrema(r)
@show rmin, rmax;

(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)

Here is the fixed-point iteration. This time we keep track of the whole sequence of approximations.

g(x) = x - p(x)
x = [2.1]
for k = 1:12
    push!(x, g(x[k]))
end
x

13-element Vector{Float64}:
 2.1
 2.59
 2.7419000000000002
 2.69148439
 2.713333728386328
 2.7044887203327885
 2.7081843632566587
 2.7066592708954196
 2.7072919457529734
 2.7070300492259465
 2.707138558717502
 2.707093617492436
 2.7071122335938966

It’s illuminating to construct and plot the sequence of errors.

err = @. abs(x - rmax)
plot(0:12, err;
    m=:o,
    xaxis=("iteration number"),  yaxis=("error", :log10),
    title="Convergence of fixed-point iteration")

It’s quite clear that the convergence quickly settles into a linear rate. We could estimate this rate by doing a least-squares fit to a straight line. Keep in mind that the values for small $k$ should be left out of the computation, as they don’t represent the linear trend.

y = log.(err[5:12])
p = Polynomials.fit(5:12, y, 1)

We can exponentiate the slope to get the convergence constant $\sigma$ .

σ = exp(p.coeffs[2])

0.4144851385485472

The error should therefore decrease by a factor of $\sigma$ at each iteration. We can check this easily from the observed data.

[err[i+1] / err[i] for i in 8:11]

4-element Vector{Float64}:
 0.4137660520817109
 0.4143987269383
 0.4141368304124451
 0.4142453399049934

The methods for finding $\sigma$ agree well.

We revisit Example 4.2.1 and investigate the observed convergence more closely. Recall that above we calculated $g'(p)\approx-0.42$ at the convergent fixed point.

f = @(x) x.^2 - 4*x + 3.5;
r = roots([1, -4, 3.5]);

Here is the fixed-point iteration. This time we keep track of the whole sequence of approximations.

g = @(x) x - f(x);
x = 2.1; 
for k = 1:12
    x(k+1) = g(x(k));
end

It’s illuminating to construct and plot the sequence of errors.

err = abs(x - r(1));
clf
semilogy(err, 'o-'), axis tight
xlabel('iteration'),  ylabel('error')
title('Convergence of fixed-point iteration')

y = log(err(5:12));
p = polyfit(5:12, y, 1);

We can exponentiate the slope to get the convergence constant $\sigma$ .

sigma = exp(p(1))

The error should therefore decrease by a factor of $\sigma$ at each iteration. We can check this easily from the observed data.

err(9:12) ./ err(8:11)

The methods for finding $\sigma$ agree well.

We revisit Example 4.2.1 and investigate the observed convergence more closely. Recall that above we calculated $g'(p)\approx-0.42$ at the convergent fixed point.

f = poly1d([1, -4, 3.5])
r = f.roots
print(r)

[2.70710678 1.29289322]

Here is the fixed-point iteration. This time we keep track of the whole sequence of approximations.

g = lambda x: x - f(x)
x = zeros(12)
x[0] = 2.1
for k in range(11):
    x[k + 1] = g(x[k])

print(x)

[2.1        2.59       2.7419     2.69148439 2.71333373 2.70448872
 2.70818436 2.70665927 2.70729195 2.70703005 2.70713856 2.70709362]

It’s illuminating to construct and plot the sequence of errors.

err = abs(x - max(r))
semilogy(err, "-o")
xlabel("iteration number"), ylabel("error")
title("Convergence of fixed-point iteration");

p = polyfit(arange(5, 13), log(err[4:]), 1)
print(p)

[-0.88071816 -0.66805739]

We can exponentiate the slope to get the convergence constant $\sigma$ .

print("sigma:", exp(p[0]))

sigma: 0.4144851385485472

The error should therefore decrease by a factor of $\sigma$ at each iteration. We can check this easily from the observed data.

err[8:] / err[7:-1]

array([0.41376605, 0.41439873, 0.41413683, 0.41424534])

The methods for finding $\sigma$ agree well.

4.2.3Contraction maps¶

The convergence condition $\sigma=|g'(p)|<1$ derived by series expansion is a special case of a more general condition.

It can be shown that a function satisfying (4.2.6) is continuous in $S$ . The idea behind a contraction mapping is that the distances between all pairs of points decrease after an application of $g$ . This situation leads to a major result about fixed points.

Proof

(partial proof) First we show there is at most one fixed point in $S$ . Suppose $g(t)=t$ and $g(s)=s$ in $S$ . Then by (4.2.6), $|s-t|=|g(s)-g(t)|\le L|s-t|$ , which for $L<1$ is possible only if $|s-t|=0$ , so $s=t$ .

Now suppose that for some $p\in S$ , $g(p)=p$ . By the definition of the fixed-point iteration and the Lipschitz condition,

|x_{k+1} - p | = |g(x_k) - g(p)| \le L |x_k-p|,

(4.2.7)

which shows that $x_k\to p$ as $k\to \infty$ . To show that $p$ must exist and complete the proof, one needs to apply the Cauchy theory of convergence of a sequence, which is beyond the scope of this text.

From the Fundamental Theorem of Calculus, which asserts that $g(s)-g(t)=\int_s^t g'(x)\, dx$ , it’s easy to conclude that an upper bound of $|g'(x)|\le L$ for all $x$ results in (4.2.6). Hence:

There are stronger and more general statements of Theorem 4.2.1. For instance, it’s possible to show that all initial $x_1$ that are sufficiently close to the fixed point will lead to convergence of the iteration. Algorithmically the main virtue of the fixed-point iteration is that it is incredibly easy to apply. However, as we are about to discover, it’s far from the fastest option.

4.2.4Exercises¶

Footnotes¶

We can only ever generate a finite sample from an infinite sequence, which in principle does not guarantee anything whatsoever about the limit or divergence of that sequence. However, in practical computing one usually assumes that well-established trends in the sequence will continue, and we complement observed experience with rigorous theory where possible.
↩