Interpolation-based methods Tobin A. Driscoll Richard J. Braun
From a practical standpoint, one of the biggest drawbacks of Newton’s method is the requirement to supply f ′ f' f ′ in Function 4.3.1 . It is both a programming inconvenience and a step that requires computational time. We can avoid using f ′ f' f ′ , however, by making a simple but easily overlooked observation:
Let’s call this the principle of approximate approximation.
In the Newton context, the principle of approximate approximation begins with the observation that the use of f ′ f' f ′ is linked to the construction of a linear approximation q ( x ) q(x) q ( x ) equivalent to a tangent line. The root of q ( x ) q(x) q ( x ) is used to define the next iterate in the sequence. We can avoid calculating the value of f ′ f' f ′ by choosing a different linear approximation.
Example 4.4.1 (Graphical interpretation of the secant method)
Example 4.4.1 We return to finding a root of the equation x e x = 2 x e^x=2 x e x = 2 .
using Plots
f(x) = x * exp(x) - 2;
plot(f, 0.25, 1.25;
label="function", legend=:topleft,
xlabel=L"x", ylabel=L"y")
From the graph, it’s clear that there is a root near x = 1 x=1 x = 1 . To be more precise, there is a root in the interval [ 0.5 , 1 ] [0.5,1] [ 0.5 , 1 ] . So let us take the endpoints of that interval as two initial approximations.
x₁ = 1;
y₁ = f(x₁);
x₂ = 0.5;
y₂ = f(x₂);
scatter!([x₁, x₂], [y₁, y₂];
label="initial points",
title="Two initial values")
Instead of constructing the tangent line by evaluating the derivative, we can construct a linear model function by drawing the line between the two points ( x 1 , f ( x 1 ) ) \bigl(x_1,f(x_1)\bigr) ( x 1 , f ( x 1 ) ) and ( x 2 , f ( x 2 ) ) \bigl(x_2,f(x_2)\bigr) ( x 2 , f ( x 2 ) ) . This is called a secant line .
m₂ = (y₂ - y₁) / (x₂ - x₁)
secant = x -> y₂ + m₂ * (x - x₂)
plot!(secant, 0.25, 1.25, label="secant line", l=:dash, color=:black,
title="Secant line")
As before, the next root estimate in the iteration is the root of this linear model.
x₃ = x₂ - y₂ / m₂
@show y₃ = f(x₃)
scatter!([x₃], [0], label="root of secant", title="First iteration")
For the next linear model, we use the line through the two most recent points. The next iterate is the root of that secant line, and so on.
m₃ = (y₃ - y₂) / (x₃ - x₂)
x₄ = x₃ - y₃ / m₃
Example 4.4.1 We return to finding a root of the equation x e x = 2 x e^x=2 x e x = 2 .
f = @(x) x .* exp(x) - 2;
clf, fplot(f, [0.25, 1.25])
set(gca, 'ygrid', 'on')
xlabel('x'), ylabel('y')
title('Objective function')
From the graph, it’s clear that there is a root near x = 1 x=1 x = 1 . To be more precise, there is a root in the interval [ 0.5 , 1 ] [0.5,1] [ 0.5 , 1 ] . So let us take the endpoints of that interval as two initial approximations.
x1 = 1; y1 = f(x1);
x2 = 0.5; y2 = f(x2);
hold on, scatter([x1, x2], [y1, y2])
title('Two initial values')
Instead of constructing the tangent line by evaluating the derivative, we can construct a linear model function by drawing the line between the two points ( x 1 , f ( x 1 ) ) \bigl(x_1,f(x_1)\bigr) ( x 1 , f ( x 1 ) ) and ( x 2 , f ( x 2 ) ) \bigl(x_2,f(x_2)\bigr) ( x 2 , f ( x 2 ) ) . This is called a secant line .
slope2 = (y2 - y1) / (x2 - x1);
secant2 = @(x) y2 + slope2 * (x - x2);
As before, the next root estimate in the iteration is the root of this linear model.
fplot(secant2,[0.25, 1.25],'k--')
x3 = x2 - y2 / slope2;
y3 = f(x3)
scatter(x3, 0)
title('Next value')
For the next linear model, we use the line through the two most recent points. The next iterate is the root of that secant line, and so on.
slope2 = (y3 - y2) / (x3 - x2);
x4 = x3 - y3 / slope2;
y4 = f(x4)
Example 4.4.1 f = lambda x: x * exp(x) - 2
xx = linspace(0.25, 1.25, 400)
fig, ax = subplots()
ax.plot(xx, f(xx), label="function")
ax.set_xlabel("$x$")
ax.set_ylabel("$f(x)$")
ax.grid();
From the graph, it’s clear that there is a root near x = 1 x=1 x = 1 . To be more precise, there is a root in the interval [ 0.5 , 1 ] [0.5,1] [ 0.5 , 1 ] . So let us take the endpoints of that interval as two initial approximations.
x1 = 1
y1 = f(x1)
x2 = 0.5
y2 = f(x2)
ax.plot([x1, x2], [y1, y2], "ko", label="initial points")
ax.legend()
fig
Instead of constructing the tangent line by evaluating the derivative, we can construct a linear model function by drawing the line between the two points ( x 1 , f ( x 1 ) ) \bigl(x_1,f(x_1)\bigr) ( x 1 , f ( x 1 ) ) and ( x 2 , f ( x 2 ) ) \bigl(x_2,f(x_2)\bigr) ( x 2 , f ( x 2 ) ) . This is called a secant line .
slope2 = (y2 - y1) / (x2 - x1)
secant2 = lambda x: y2 + slope2 * (x - x2)
ax.plot(xx, secant2(xx), "--", label="secant line")
ax.legend()
fig
As before, the next root estimate in the iteration is the root of this linear model.
x3 = x2 - y2 / slope2
ax.plot(x3, 0, "o", label="root of secant")
y3 = f(x3)
print(y3)
ax.legend()
fig
For the next linear model, we use the line through the two most recent points. The next iterate is the root of that secant line, and so on.
slope3 = (y3 - y2) / (x3 - x2)
x4 = x3 - y3 / slope3
print(f(x4))
The example in Example 4.4.1 demonstrates the secant method . In the secant method, one finds the root of the linear approximation through the two most recent root estimates. That is, given previous approximations x 1 , … , x k x_1,\ldots,x_k x 1 , … , x k , define the linear model function as the line through ( x k − 1 , f ( x k − 1 ) ) \bigl(x_{k-1},f(x_{k-1})\bigr) ( x k − 1 , f ( x k − 1 ) ) and ( x k , f ( x k ) ) \bigl(x_k,f(x_k)\bigr) ( x k , f ( x k ) ) :
q ( x ) = f ( x k ) + f ( x k ) − f ( x k − 1 ) x k − x k − 1 ( x − x k ) . q(x) = f(x_k) + \frac{f(x_k)-f(x_{k-1})}{x_k-x_{k-1}}(x-x_k). q ( x ) = f ( x k ) + x k − x k − 1 f ( x k ) − f ( x k − 1 ) ( x − x k ) . Solving q ( x k + 1 ) = 0 q(x_{k+1})=0 q ( x k + 1 ) = 0 for x k + 1 x_{k+1} x k + 1 gives the following iteration formula.
Definition 4.4.1 (Secant method)
Given function f f f and two initial values x 1 x_1 x 1 and x 2 x_2 x 2 , define
x k + 1 = x k − f ( x k ) ( x k − x k − 1 ) f ( x k ) − f ( x k − 1 ) , k = 2 , 3 , … . x_{k+1} = x_k - \frac{f(x_k)(x_k-x_{k-1})}{f(x_k)-f(x_{k-1})}, \quad k=2,3,\ldots. x k + 1 = x k − f ( x k ) − f ( x k − 1 ) f ( x k ) ( x k − x k − 1 ) , k = 2 , 3 , … . Return the sequence { x k } \{x_k\} { x k } .
Our implementation of the secant method is given in Function 4.4.1 .
Secant method1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"""
secant(f, x₁, x₂ [;maxiter, ftol, xtol])
Use the secant method to find a root of `f` starting from `x₁` and
`x₂`. Returns a vector of root estimates.
The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.
"""
function secant(f, x₁, x₂; maxiter = 40, ftol = 1e-13, xtol = 1e-13)
x = [float(x₁), float(x₂)]
y₁ = f(x₁)
Δx, y₂ = Inf, Inf # for initial pass in the loop below
k = 2
while (abs(Δx) > xtol) && (abs(y₂) > ftol)
y₂ = f(x[k])
Δx = -y₂ * (x[k] - x[k-1]) / (y₂ - y₁) # secant step
push!(x, x[k] + Δx) # append new estimate
k += 1
y₁ = y₂ # current f-value becomes the old one next time
if k == maxiter
@warn "Maximum number of iterations reached."
break # exit loop
end
end
return x
end
About the code
Because we want to observe the convergence of the method, Function 4.4.1 stores and returns the entire sequence of root estimates. However, only the most recent two are needed by the iterative formula. This is demonstrated by the use of y₁
and y₂
for the two most recent values of f f f .
Secant method1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
function x = secant(f, x1, x2)
% SECANT Secant method for a scalar equation.
% Input:
% f objective function
% x1, x2 initial root approximations
% Output
% x vector of root approximations (last is best)
% Stopping parameters.
funtol = 100 * eps; % stop for small f(x)
xtol = 100 * eps; % stop for small x change
maxiter = 40; % stop after this many iterations
x = [x1 x2];
y1 = f(x1); y2 = f(x2);
k = 2; % iteration counter
dx = Inf; % for initial pass below
while (abs(dx) > xtol) && (abs(y2) > funtol) && (k < maxiter)
dx = -y2 * (x(k) - x(k-1)) / (y2 - y1); % secant step
x(k+1) = x(k) + dx;
k = k+1;
y1 = y2; % current f-value becomes the old one next time
y2 = f(x(k));
end
if k==maxiter
warning('Maximum number of iterations reached.')
end
end
Secant method1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def secant(f, x1, x2):
"""
secant(f, x1, x2)
Use the secant method to find a root of f starting from x1 and x2. Returns a
vector of root estimates.
"""
# Operating parameters.
eps = np.finfo(float).eps
funtol = 100 * eps
xtol = 100 * eps
maxiter = 40
x = np.zeros(maxiter)
x[:2] = [x1, x2]
y1 = f(x1)
y2 = 100
dx = np.inf # for initial pass below
k = 1
while (abs(dx) > xtol) and (abs(y2) > funtol) and (k < maxiter):
y2 = f(x[k])
dx = -y2 * (x[k] - x[k - 1]) / (y2 - y1) # secant step
x[k + 1] = x[k] + dx # new estimate
k = k + 1
y1 = y2 # current f-value becomes the old one next time
if k == maxiter:
warnings.warn("Maximum number of iterations reached.")
return x[:k+1]
About the code
Because we want to observe the convergence of the method, Function 4.4.1 stores and returns the entire sequence of root estimates. However, only the most recent two are needed by the iterative formula. This is demonstrated by the use of y₁
and y₂
for the two most recent values of f f f .
4.4.1 Convergence ¶ Graphically, a secant line usually looks like a less accurate model of f f f than the tangent line. How will that affect the convergence?
As before, let ϵ k = x k − r \epsilon_k = x_k-r ϵ k = x k − r be the errors in the successive root approximations, and assume that r r r is a simple root. If the initial errors are small, then a tedious but straightforward Taylor expansion shows that, to lowest order,
ϵ k + 1 ≈ 1 2 f ′ ′ ( r ) f ′ ( r ) ϵ k ϵ k − 1 . \epsilon_{k+1} \approx \frac{1}{2}\frac{f''(r)}{f'(r)} \epsilon_k \epsilon_{k-1}. ϵ k + 1 ≈ 2 1 f ′ ( r ) f ′′ ( r ) ϵ k ϵ k − 1 . If we make an educated guess that
ϵ k + 1 = c ( ϵ k ) α , ϵ k = c ( ϵ k − 1 ) α , … , α > 0 , \epsilon_{k+1} = c (\epsilon_k)^\alpha, \quad \epsilon_k = c (\epsilon_{k-1})^\alpha, \ldots, \qquad \alpha>0, ϵ k + 1 = c ( ϵ k ) α , ϵ k = c ( ϵ k − 1 ) α , … , α > 0 , then (4.4.3) becomes
[ ϵ k − 1 α ] α ≈ C ϵ k − 1 α + 1 \left[ \epsilon_{k-1}^{\alpha} \right]^{\,\alpha} \approx C \epsilon_{k-1}^{\alpha+1} [ ϵ k − 1 α ] α ≈ C ϵ k − 1 α + 1 for an unknown constant C C C . Treating the approximation as an equality, this becomes solvable if and only if the exponents match, i.e., α 2 = α + 1 \alpha^2 = \alpha+1 α 2 = α + 1 . The only positive root of this equation is the golden ratio,
α = 1 + 5 2 ≈ 1.618. \alpha = \frac{1+\sqrt{5}}{2} \approx 1.618. α = 2 1 + 5 ≈ 1.618. Hence, the errors in the secant method converge like ϵ k + 1 = c ( ϵ k ) α \epsilon_{k+1} = c (\epsilon_k)^\alpha ϵ k + 1 = c ( ϵ k ) α for 1 < α < 2 1<\alpha<2 1 < α < 2 .
Definition 4.4.2 (Superlinear convergence)
Suppose a sequence x k x_k x k approaches limit x ∗ x^* x ∗ . If the error sequence ϵ k = x k − x ∗ \epsilon_k=x_k - x^* ϵ k = x k − x ∗ satisfies
lim k → ∞ ∣ ϵ k + 1 ∣ ∣ ϵ k ∣ α = L \lim_{k\to\infty} \frac{|\epsilon_{k+1}|}{|\epsilon_k|^\alpha} = L k → ∞ lim ∣ ϵ k ∣ α ∣ ϵ k + 1 ∣ = L for constants α > 1 \alpha >1 α > 1 and L > 0 L>0 L > 0 , then the sequence has superlinear convergence with rate α.
Quadratic convergence is a particular case of superlinear convergence. Roughly speaking, we expect
log ∣ ϵ k + 1 ∣ ≈ α ( log ∣ ϵ k ∣ ) + log L , log ∣ ϵ k + 1 ∣ log ∣ ϵ k ∣ ≈ α + log L log ∣ ϵ k ∣ → α , \begin{align*}
\log |\epsilon_{k+1}| & \approx \alpha (\log |\epsilon_k|) + \log L, \\
\frac{\log |\epsilon_{k+1}|}{\log |\epsilon_k|} & \approx \alpha + \frac{\log L}{\log |\epsilon_k|} \to \alpha,
\end{align*} log ∣ ϵ k + 1 ∣ log ∣ ϵ k ∣ log ∣ ϵ k + 1 ∣ ≈ α ( log ∣ ϵ k ∣ ) + log L , ≈ α + log ∣ ϵ k ∣ log L → α , as k → ∞ k\to\infty k → ∞ .
Example 4.4.2 (Convergence of the secant method)
Example 4.4.2 We check the convergence of the secant method from Example 4.4.1 . Again we will use extended precision to get a longer sequence than double precision allows.
f(x) = x * exp(x) - 2
x = FNC.secant(f, BigFloat(1), BigFloat(0.5), xtol=1e-80, ftol=1e-80);
We don’t know the exact root, so we use the last value as a proxy.
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605
Here is the sequence of errors.
ϵ = @. Float64(r - x[1:end-2])
11-element Vector{Float64}:
-0.14739449798627452
0.3526055020137255
0.04223372706144885
-0.013026425327222755
0.00042747994131549927
4.269915586133851e-6
-1.4054770126368277e-9
4.620323656624992e-15
4.999480931132388e-24
-1.7783862252641536e-38
6.845099610444838e-62
It’s not easy to see the convergence rate by staring at these numbers. We can use (4.4.8) to try to expose the superlinear convergence rate.
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [eachindex(ϵ) ϵ logerr ratios]
As expected, this settles in at around 1.618.
Example 4.4.2 We check the convergence of the secant method from Example 4.4.1 .
f = @(x) x .* exp(x) - 2;
x = secant(f, 1, 0.5);
Unrecognized function or variable 'secant'.
We don’t know the exact root, so we use fzero
to get a good proxy.
Here is the sequence of errors.
format short e
err = r - x(1:end-1)'
It’s not easy to see the convergence rate by staring at these numbers. We can use (4.4.8) to try to expose the superlinear convergence rate.
logerr = log(abs(err));
ratios = logerr(2:end) ./ logerr(1:end-1)
As expected, this settles in at around 1.618.
Example 4.4.2 We check the convergence of the secant method from Example 4.4.1 .
f = lambda x: x * exp(x) - 2
x = FNC.secant(f, 1, 0.5)
print(x)
[1. 0.5 0.81037177 0.86563193 0.85217802 0.85260123
0.8526055 0.8526055 0.8526055 ]
We don’t know the exact root, so we use root_scalar
to get a substitute.
from scipy.optimize import root_scalar
r = root_scalar(f, bracket=[0.5, 1]).root
print(r)
Here is the sequence of errors.
[-1.47394498e-01 3.52605502e-01 4.22337271e-02 -1.30264253e-02
4.27479941e-04 4.26991559e-06 -1.40547707e-09 4.55191440e-15
0.00000000e+00]
It’s not easy to see the convergence rate by staring at these numbers. We can use (4.4.8) to try to expose the superlinear convergence rate.
logerr = log(abs(err))
for i in range(len(err) - 2):
print(logerr[i+1] / logerr[i])
0.5444386280277934
3.0358017547194565
1.3716940021941457
1.7871469297607958
1.593780475055435
1.6485786717829443
1.62014464365765
/var/folders/0m/fy_f4_rx5xv68sdkrpcm26r00000gq/T/ipykernel_8940/1382054591.py:1: RuntimeWarning: divide by zero encountered in log
logerr = log(abs(err))
As expected, this settles in at around 1.618.
In terms of the error as a function of the iteration number k k k , the secant method converges at a rate strictly between linear and quadratic, which is slower than Newton’s method. But error versus iteration count may not be the best means of comparison.
Often we analyze rootfinding methods by assuming that the bulk of computing time is spent evaluating the user-defined functions f f f and f ′ f' f ′ . (Our simple examples and exercises mostly don’t support this assumption, but many practical applications do.) In this light, we see that Newton’s method requires two evaluations, f ( x k ) f(x_k) f ( x k ) and f ′ ( x k ) f'(x_k) f ′ ( x k ) , for each iteration. The secant method, on the other hand, while it uses the two function values f ( x k ) f(x_k) f ( x k ) and f ( x k − 1 ) f(x_{k-1}) f ( x k − 1 ) at each iteration, only needs to compute a single new one. Note that Function 4.4.1 keeps track of one previous function value rather than recomputing it.
Now suppose that ∣ ϵ k ∣ = δ |\epsilon_k|=\delta ∣ ϵ k ∣ = δ . Roughly speaking, two units of work (i.e., function evaluations) in Newton’s method brings us to an error of δ 2 \delta^2 δ 2 . If one spreads out the improvement in the error evenly across the two steps, using
δ 2 = ( δ 2 ) 2 , \delta^2 = \bigl( \delta^{\sqrt{2}} \bigr)^{\!\sqrt{2}}, δ 2 = ( δ 2 ) 2 , it seems reasonable to say that the rate of convergence in Newton per function evaluation is 2 ≈ 1.41 \sqrt{2}\approx 1.41 2 ≈ 1.41 . This is actually less than the comparable rate of about 1.62 for the secant method.
Not only is the secant method easier to apply than Newton’s method in practice, it’s also more efficient—a rare win-win!
4.4.2 Inverse interpolation ¶ At each iteration, the secant method constructs a linear model function that interpolates the two most recently found points on the graph of f f f . Two points determine a straight line, so this seems like a sensible choice. But as the iteration progresses, why use only the two most recent points? What would it mean to use more of them?
If we interpolate through three points by a polynomial, we get a unique quadratic function. Unfortunately, a parabola may have zero, one, or two crossings of the x x x -axis, potentially leaving some doubt as to how to define the next root estimate. On the other hand, if we turn a parabola on its side, we get a graph that intersects the x x x -axis exactly once, which is ideal for defining the next root estimate.
This leads to the idea of defining q ( y ) q(y) q ( y ) as the quadratic interpolant to the points ( y k − 2 , x k − 2 ) (y_{k-2},x_{k-2}) ( y k − 2 , x k − 2 ) , ( y k − 1 , x k − 1 ) (y_{k-1},x_{k-1}) ( y k − 1 , x k − 1 ) , and ( y k , x k ) (y_k,x_k) ( y k , x k ) , where y i = f ( x i ) y_i=f(x_i) y i = f ( x i ) for all i i i , and setting x k + 1 = q ( 0 ) x_{k+1}=q(0) x k + 1 = q ( 0 ) . The process defined in this way (given three initial estimates) is called inverse quadratic interpolation . Rather than deriving lengthy formulas for it here, we demonstrate how to perform inverse quadratic interpolation using fit
to perform the interpolation step.
Example 4.4.3 (Inverse quadratic interpolation)
Example 4.4.3 Here we look for a root of x + cos ( 10 x ) x+\cos(10x) x + cos ( 10 x ) that is close to 1.
f(x) = x + cos(10 * x)
interval = [0.5, 1.5]
plot(f, interval..., label="Function", legend=:bottomright,
grid=:y, ylim=[-0.1, 3], xlabel=L"x", ylabel=L"y")
We choose three values to get the iteration started.
x = [0.8, 1.2, 1]
y = @. f(x)
scatter!(x, y, label="initial points")
If we were using forward interpolation, we would ask for the polynomial interpolant of y y y as a function of x x x . But that parabola has no real roots.
using Polynomials
q = Polynomials.fit(x, y, 2) # interpolating polynomial
plot!(x -> q(x), interval..., l=:dash, label="interpolant")
To do inverse interpolation, we swap the roles of x x x and y y y in the interpolation.
Tip
By giving two functions in the plot call, we get the parametric plot ( q ( y ) , y ) (q(y),y) ( q ( y ) , y ) as a function of y y y .
plot(f, interval..., label="Function",
legend=:bottomright, grid=:y, xlabel=L"x", ylabel=L"y")
scatter!(x, y, label="initial points")
q = Polynomials.fit(y, x, 2) # interpolating polynomial
plot!(y -> q(y), y -> y, -0.1, 2.6, l=:dash, label="inverse interpolant")
We seek the value of x x x that makes y y y zero. This means evaluating q q q at zero.
Let’s restart the process with BigFloat
numbers to get a convergent sequence.
x = BigFloat.([8, 12, 10]) / 10
y = @. f(x)
for k = 3:12
q = Polynomials.fit(y[k-2:k], x[k-2:k], 2)
push!(x, q(0))
push!(y, f(x[k+1]))
end
println("residual = $(f(x[end]))")
As far as our current precision is concerned, we have an exact root.
r = x[end]
ϵ = @. Float64(abs(r - x[1:end-1]))
logerr = @. log10(abs(ϵ))
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "error", "log error", "ratio"] [eachindex(ϵ) ϵ logerr ratios]
The convergence is probably superlinear at a rate of α = 1.8 \alpha=1.8 α = 1.8 or so.
Example 4.4.3 Here we look for a root of x + cos ( 10 x ) x+\cos(10x) x + cos ( 10 x ) that is close to 1.
f = @(x) x + cos(10 * x);
interval = [0.5, 1.5];
clf, fplot(f, interval)
set(gca, 'ygrid', 'on'), axis(axis)
title('Objective function')
xlabel('x'), ylabel('y')
r = fzero(f, 1)
We choose three values to get the iteration started.
x = [0.8, 1.2, 1]';
y = f(x);
hold on, scatter(x, y)
title('Three initial points')
If we were using forward interpolation, we would ask for the polynomial interpolant of y y y as a function of x x x . But that parabola has no real roots.
c = polyfit(x, y, 2); % coefficients of interpolant
q = @(x) polyval(c, x);
fplot(q, interval, '--')
title('Parabola model')
To do inverse interpolation, we swap the roles of x x x and y y y in the interpolation.
Tip
By giving two functions in the fplot
call, we get the parametric plot ( q ( y ) , y ) (q(y),y) ( q ( y ) , y ) as a function of y y y .
cla, fplot(f, interval)
scatter(x, y)
c = polyfit(y, x, 2); % coefficients of interpolating polynomial
q = @(y) polyval(c, y);
fplot(q, @(y) y, ylim,'--') % plot x=q(y), y=y
title('Sideways parabola')
We seek the value of x x x that makes y y y zero. This means evaluating q q q at zero.
x = [x; q(0)];
y = [y; f(x(end))]
We repeat the process a few more times.
for k = 4:8
c = polyfit(y(k-2:k), x(k-2:k), 2);
x(k+1) = polyval(c, 0);
y(k+1) = f(x(k+1));
end
disp('final residual:')
y(end)
Here is the sequence of errors.
format short e
err = x - r
The convergence is probably superlinear:
logerr = log(abs(err));
ratios = logerr(2:end) ./ logerr(1:end-1)
Example 4.4.3 Here we look for a root of x + cos ( 10 x ) x+\cos(10x) x + cos ( 10 x ) that is close to 1.
f = lambda x: x + cos(10 * x)
xx = linspace(0.5, 1.5, 400)
fig, ax = subplots()
ax.plot(xx, f(xx), label="function")
ax.grid()
xlabel("$x$"), ylabel("$y$")
fig
We choose three values to get the iteration started.
x = array([0.8, 1.2, 1])
y = f(x)
ax.plot(x, y, "ko", label="initial points")
ax.legend()
fig
If we were using forward interpolation, we would ask for the polynomial interpolant of y y y as a function of x x x . But that parabola has no real roots.
q = poly1d(polyfit(x, y, 2)) # interpolating polynomial
ax.plot(xx, q(xx), "--", label="interpolant")
ax.set_ylim(-0.1, 3), ax.legend()
fig
To do inverse interpolation, we swap the roles of x x x and y y y in the interpolation.
plot(xx, f(xx), label="function")
plot(x, y, "ko", label="initial points")
q = poly1d(polyfit(y, x, 2)) # inverse interpolating polynomial
yy = linspace(-0.1, 2.6, 400)
plot(q(yy), yy, "--", label="inverse interpolant")
grid(), xlabel("$x$"), ylabel("$y$")
legend();
We seek the value of x x x that makes y y y zero. This means evaluating q q q at zero.
x = hstack([x, q(0)])
y = hstack([y, f(x[-1])])
print("x:", x, "\ny:", y)
x: [0.8 1.2 1. 1.10398139]
y: [0.65449997 2.04385396 0.16092847 1.14820652]
We repeat the process a few more times.
for k in range(6):
q = poly1d(polyfit(y[-3:], x[-3:], 2))
x = hstack([x, q(0)])
y = hstack([y, f(x[-1])])
print(f"final residual is {y[-1]:.2e}")
final residual is 1.47e-14
Here is the sequence of errors.
from scipy.optimize import root_scalar
r = root_scalar(f, bracket=[0.9, 1]).root
err = x - r
print(err)
[-1.67888402e-01 2.32111598e-01 3.21115982e-02 1.36092984e-01
1.53473435e-02 3.26831473e-03 4.61743614e-04 6.29584770e-06
3.43903706e-09 6.37268016e-14]
The error seems to be superlinear, but subquadratic:
logerr = log(abs(err))
for i in range(len(err) - 1):
print(logerr[i+1] / logerr[i])
0.818477543986624
2.354297090262193
0.5800188694773643
2.0942526256501486
1.3702985897217017
1.341928284536839
1.5592238835587917
1.6273123181772837
1.5591161372818163
4.4.3 Bracketing ¶ Like Newton’s method, the secant and inverse quadratic interpolation methods cannot guarantee convergence. One final new idea is needed to make a (nearly) foolproof algorithm.
If f f f is continuous on the interval [ a , b ] [a,b] [ a , b ] and f ( a ) f ( b ) < 0 f(a)f(b)<0 f ( a ) f ( b ) < 0 —that is, f f f changes sign on the interval—then f f f must have at least one root in the interval, due to the Intermediate Value Theorem from calculus. If we come up with a new root estimate c ∈ ( a , b ) c\in(a,b) c ∈ ( a , b ) , then whatever sign f ( c ) f(c) f ( c ) is, it is different from the sign at one of the endpoints. (Of course, if f ( c ) f(c) f ( c ) is zero, we are done!) So either [ a , c ] [a,c] [ a , c ] or [ c , b ] [c,b] [ c , b ] is guaranteed to have a root too, and in this way we can maintain not just individual estimates but an interval that always contains a root.
The best algorithms blend the use of fast-converging methods with the guarantee provided by a bracket. For example, say that an iteration begins with a bracketing interval. Make a list of the inverse quadratic estimate, the secant estimate, and the midpoint of the current interval, and pick the first member of the list that lies within the current interval. Replace the interval with the bracketing subinterval, and start a new iteration. This is the idea behind Brent’s method , which is a very successful rootfinding algorithm.
4.4.4 Exercises ¶ For each of Exercises 1–3, do the following steps.
(a) ✍ Rewrite the equation into the standard form for rootfinding, f ( x ) = 0 f(x) = 0 f ( x ) = 0 .
(b) ⌨ Make a plot of f f f over the given interval and determine how many roots lie in the interval.
(c) ⌨ Use nlsolve
with ftol=1e-15
to find a reference value for each root.
(d) ⌨ Determine a bracketing interval for each root. Then use Function 4.4.1 , starting with the endpoints of the bracketing interval, to find each root.
(e) ⌨ For one of the roots, use the errors in the Newton sequence to determine numerically whether the convergence is apparently between linear and quadratic.
x 2 = e − x x^2=e^{-x} x 2 = e − x , over [ − 2 , 2 ] [-2,2] [ − 2 , 2 ]
2 x = tan x 2x = \tan x 2 x = tan x , over [ − 0.2 , 1.4 ] [-0.2,1.4] [ − 0.2 , 1.4 ]
e x + 1 = 2 + x e^{x+1}=2+x e x + 1 = 2 + x , over [ − 2 , 2 ] [-2,2] [ − 2 , 2 ]
⌨ Use a plot to approximately locate all the roots of f ( x ) = x − 2 − sin ( x ) f(x)=x^{-2}-\sin(x) f ( x ) = x − 2 − sin ( x ) in the interval [ 0.5 , 10 ] [0.5,10] [ 0.5 , 10 ] . Then find a pair of initial points for each root such that Function 4.4.1 converges to that root.
✍ In general, the secant method formula (4.4.2) cannot be applied if x k = x k − 1 x_{k}=x_{k-1} x k = x k − 1 . However, suppose that f ( x ) = a x 2 + b x + c f(x)=ax^2+bx+c f ( x ) = a x 2 + b x + c for constants a a a , b b b , and c c c . Show that in this case the formula can be simplified to one that is well defined when x k = x k − 1 x_{k}=x_{k-1} x k = x k − 1 . Then show that the resulting x k + 1 x_{k+1} x k + 1 is the same as the result of one step of Newton’s method applied to f f f at x k x_k x k .
✍ Let f ( x ) = x 2 f(x)=x^2 f ( x ) = x 2 . Show that if ( 1 / x 1 ) (1/x_1) ( 1/ x 1 ) and ( 1 / x 2 ) (1/x_2) ( 1/ x 2 ) are positive integers, and the secant iteration is applied, then the sequence 1 / x 1 , 1 / x 2 , 1 / x 3 , … 1/x_1,1/x_2,1/x_3,\ldots 1/ x 1 , 1/ x 2 , 1/ x 3 , … is a Fibonacci sequence, i.e., satisfying x k + 1 = x k + x k − 1 x_{k+1}=x_k+x_{k-1} x k + 1 = x k + x k − 1 .
⌨ Write a function iqi(f,x₁,x₂,x₃)
that performs inverse quadratic interpolation for finding a root of f f f , given three initial estimates. To find the quadratic polynomial q ( y ) q(y) q ( y ) passing through the three most recent points, use fit
. Test your function on x 2 = e − x x^2=e^{-x} x 2 = e − x , over [ − 2 , 2 ] [-2,2] [ − 2 , 2 ] .