After the solution of square linear systems, we generalized to the case of having more constraints to satisfy than available variables. Our next step is to do the same for nonlinear equations, thus filling out this table:
linear nonlinear square A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax = b f ( x ) = 0 \mathbf{f}(\mathbf{x})=\boldsymbol{0} f ( x ) = 0 overdetermined min ∣ A x − b ∣ 2 \min\, \bigl|\mathbf{A}\mathbf{x} - \mathbf{b}\bigr|_2 min ∣ ∣ Ax − b ∣ ∣ 2 min ∣ f ( x ) ∣ 2 \min\, \bigl|\mathbf{f}(\mathbf{x}) \bigr|_2 min ∣ ∣ f ( x ) ∣ ∣ 2
As in the linear case, we consider only overdetermined problems, where m > n m>n m > n . Minimizing a positive quantity is equivalent to minimizing its square, so we could also define the result as minimizing ϕ ( x ) = f ( x ) T f ( x ) \phi(\mathbf{x})=\mathbf{f}(\mathbf{x})^T\mathbf{f}(\mathbf{x}) ϕ ( x ) = f ( x ) T f ( x ) .
4.7.1 Gauss–Newton method ¶ You should not be surprised to learn that we can formulate an algorithm by substituting a linear model function for f \mathbf{f} f . At a current estimate x k \mathbf{x}_k x k we define
q ( x ) = f ( x k ) + A k ( x − x k ) , \mathbf{q}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_k) + \mathbf{A}_k(\mathbf{x}-\mathbf{x}_k), q ( x ) = f ( x k ) + A k ( x − x k ) , where A k \mathbf{A}_k A k is the exact m × n m\times n m × n Jacobian matrix, J ( x k ) \mathbf{J}(\mathbf{x}_k) J ( x k ) , or an approximation of it as described in Quasi-Newton methods .
In the square case, we solved q = 0 \mathbf{q}=\boldsymbol{0} q = 0 to define the new value for x \mathbf{x} x , leading to the condition A k s k = − f k \mathbf{A}_k\mathbf{s}_k=-\mathbf{f}_k A k s k = − f k , where s k = x k + 1 − x k \mathbf{s}_k=\mathbf{x}_{k+1}-\mathbf{x}_k s k = x k + 1 − x k . Now, with m > n m>n m > n , we cannot expect to solve q = 0 \mathbf{q}=\boldsymbol{0} q = 0 , so instead we define x k + 1 \mathbf{x}_{k+1} x k + 1 as the value that minimizes ∥ q ∥ 2 \| \mathbf{q} \|_2 ∥ q ∥ 2 .
Algorithm 4.7.1 (Gauss–Newton method)
Given f \mathbf{f} f and a starting value x 1 \mathbf{x}_1 x 1 , for each k = 1 , 2 , 3 , … k=1,2,3,\ldots k = 1 , 2 , 3 , …
Compute y k = f ( x k ) \mathbf{y}_k = \mathbf{f}(\mathbf{x}_k) y k = f ( x k ) and A k \mathbf{A}_k A k , the exact or approximate Jacobian matrix at x k \mathbf{x}_k x k . Solve the linear least squares problem argmin ∥ A k s k + y k ∥ 2 \argmin \| \mathbf{A}_k\mathbf{s}_k + \mathbf{y}_k\|_2 argmin ∥ A k s k + y k ∥ 2 for s k \mathbf{s}_k s k . Let x k + 1 = x k + s k \mathbf{x}_{k+1} = \mathbf{x}_k + \mathbf{s}_k x k + 1 = x k + s k . In brief, Gauss–Newton solves a series of linear least-squares problems in order to solve a nonlinear least-squares problem.
Surprisingly, Function 4.5.2 and Function 4.6.3 , which were introduced for the case of m = n m=n m = n nonlinear equations, work without modification as the Gauss–Newton method for the overdetermined case! The reason is that the backslash operator applies equally well to the linear system and linear least-squares problems, and nothing else in those functions was written with explicit reference to n n n .
4.7.2 Convergence ¶ In the multidimensional Newton method for a nonlinear system, we expect quadratic convergence to a solution in the typical case. For the Gauss–Newton method, the picture is more complicated.
As always in least-squares problems, the residual f ( x ) \mathbf{f}(\mathbf{x}) f ( x ) will not necessarily be zero when ∥ f ∥ \|\mathbf{f}\| ∥ f ∥ is minimized. Suppose that the minimum value of ∥ f ∥ \|\mathbf{f}\| ∥ f ∥ is R > 0 R>0 R > 0 . In general, we might observe quadratic-like convergence until the iterate ∥ x k ∥ \|\mathbf{x}_k\| ∥ x k ∥ is within distance R R R of a true minimizer, and linear convergence thereafter. When R R R is not sufficiently small, the convergence can be quite slow.
Example 4.7.1 (Convergence of nonlinear least squares)
Example 4.7.1 We will observe the convergence of Function 4.6.3 for different levels of the minimum least-squares residual. We start with a function mapping from R 2 \real^2 R 2 into R 3 \real^3 R 3 , and a point that will be near the optimum.
g(x) = [sin(x[1] + x[2]), cos(x[1] - x[2]), exp(x[1] - x[2])]
p = [1, 1];
The function g ( x ) − g ( p ) \mathbf{g}(\mathbf{x}) - \mathbf{g}(\mathbf{p}) g ( x ) − g ( p ) obviously has a zero residual at p \mathbf{p} p . We’ll make different perturbations of that function in order to create nonzero residuals.
Tip
@sprintf
is a way to format numerical values as strings, patterned after the C function printf
.
using Printf
plt = plot(xlabel="iteration", yaxis=(:log10, "error"),
title="Convergence of Gauss–Newton")
for R in [1e-3, 1e-2, 1e-1]
# Define the perturbed function.
f(x) = g(x) - g(p) + R * normalize([-1, 1, -1])
x = FNC.levenberg(f, [0, 0])
r = x[end]
err = [norm(x - r) for x in x[1:end-1]]
normres = norm(f(r))
plot!(err, label=@sprintf("R=%.2g", normres))
end
plt
In the least perturbed case, where the minimized residual is less than 10-3 , the convergence is plausibly quadratic. At the next level up, the convergence starts similarly but suddenly stagnates for a long time. In the most perturbed case, the quadratic phase is nearly gone and the overall shape looks linear.
Example 4.7.1 We will observe the convergence of Function 4.6.3 for different levels of the minimum least-squares residual. We start with a function mapping from R 2 \real^2 R 2 into R 3 \real^3 R 3 , and a point that will be near the optimum.
g = @(x) [sin(x(1) + x(2)); cos(x(1) - x(2)); exp(x(1) - x(2))];
p = [1; 1];
The function g ( x ) − g ( p ) \mathbf{g}(\mathbf{x}) - \mathbf{g}(\mathbf{p}) g ( x ) − g ( p ) obviously has a zero residual at p \mathbf{p} p . We’ll make different perturbations of that function in order to create nonzero residuals.
Tip
@sprintf
is a way to format numerical values as strings, patterned after the C function printf
.
clf
labels = [];
for R = [1e-3, 1e-2, 1e-1]
% Define the perturbed function.
f = @(x) g(x) - g(p) + R * [-1; 1; -1] / sqrt(3)
x = levenberg(f, [0; 0]);
r = x(:, end);
err = abs(x(1, 1:end-1) - r(1));
normres = norm(f(r));
semilogy(err), hold on
labels = [labels; sprintf("R=%.2g", normres)];
end
xlabel("iteration"), ylabel("error")
legend(labels);
In the least perturbed case, where the minimized residual is less than 10-3 , the convergence is plausibly quadratic. At the next level up, the convergence starts similarly but suddenly stagnates for a long time. In the most perturbed case, the quadratic phase is nearly gone and the overall shape looks linear.
Example 4.7.1 We will observe the convergence of Function 4.6.3 for different levels of the minimum least-squares residual. We start with a function mapping from R 2 \real^2 R 2 into R 3 \real^3 R 3 , and a point that will be near the optimum.
g = lambda x: array([sin(x[0] + x[1]), cos(x[0] - x[1]), exp(x[0] - x[1])])
p = array([1, 1])
The function g ( x ) − g ( p ) \mathbf{g}(\mathbf{x}) - \mathbf{g}(\mathbf{p}) g ( x ) − g ( p ) obviously has a zero residual at p \mathbf{p} p . We’ll make different perturbations of that function in order to create nonzero residuals.
for R in [1e-3, 1e-2, 1e-1]:
# Define the perturbed function.
f = lambda x: g(x) - g(p) + R * array([-1, 1, -1]) / sqrt(3)
x = FNC.levenberg(f, [0, 0])
r = x[-1]
err = [norm(x[j] - r) for j in range(len(x) - 1)]
normres = norm(f(r))
semilogy(err, label=f"R={normres:.2g}")
title("Convergence of Gauss–Newton")
xlabel("iteration"), ylabel("error")
legend();
/Users/driscoll/Documents/GitHub/fnc/python/fncbook/fncbook/chapter04.py:167: UserWarning: Iteration did not find a root.
warnings.warn("Iteration did not find a root.")
In the least perturbed case, where the minimized residual is less than 10-3 , the convergence is plausibly quadratic. At the next level up, the convergence starts similarly but suddenly stagnates for a long time. In the most perturbed case, the quadratic phase is nearly gone and the overall shape looks linear.
4.7.3 Nonlinear data fitting ¶ In Fitting functions to data we saw how to fit functions to data values, provided that the set of candidate fitting functions depends linearly on the undetermined coefficients. We now have a tool to generalize that process to fitting functions that depend nonlinearly on unknown parameters.
Suppose that ( t i , y i ) (t_i,y_i) ( t i , y i ) for i = 1 , … , m i=1,\ldots,m i = 1 , … , m are given points. We wish to model the data by a function g ( t , x ) g(t,\mathbf{x}) g ( t , x ) that depends on unknown parameters x 1 , … , x n x_1,\ldots,x_n x 1 , … , x n in an arbitrary way. A standard approach is to minimize the discrepancy between the model and the observations, in a least-squares sense. Define
f ( x ) = [ g ( t i , x ) − y i ] i = 1 , … , m . \mathbf{f}(\mathbf{x}) = \left[\, g(t_i,\mathbf{x})-y_i \, \right]_{\,i=1,\ldots,m}. f ( x ) = [ g ( t i , x ) − y i ] i = 1 , … , m . We call f \mathbf{f} f a misfit function. By minimizing ∥ f ( c ) ∥ 2 \bigl\| \mathbf{f}(\mathbf{c}) \bigr\|^2 ∥ ∥ f ( c ) ∥ ∥ 2 , we get the best possible fit to the data. If an explicit Jacobian matrix is desired for the minimization, we can compute
f ′ ( x ) = [ ∂ ∂ x j g ( t i , x ) ] i = 1 , … , m ; j = 1 , … , n . \mathbf{f}{\,}'(\mathbf{x}) = \left[ \frac{\partial}{\partial x_j} g(t_i,\mathbf{x}) \right]_{\,i=1,\ldots,m;\,j=1,\ldots,n.} f ′ ( x ) = [ ∂ x j ∂ g ( t i , x ) ] i = 1 , … , m ; j = 1 , … , n . The form of g g g is up to the modeler. There may be compelling theoretical choices, or you may just be looking for enough algebraic power to express the data well. Naturally, in the special case where the dependence on c \mathbf{c} c is linear, i.e.,
g ( t , c ) = c 1 g 1 ( t ) + c 2 g 2 ( t ) + ⋯ + c m g m ( t ) , g(t,\mathbf{c}) = c_1 g_1(t) + c_2 g_2(t) + \cdots + c_m g_m(t), g ( t , c ) = c 1 g 1 ( t ) + c 2 g 2 ( t ) + ⋯ + c m g m ( t ) , then the misfit function is also linear in c \mathbf{c} c and the fitting problem reduces to linear least squares.
Example 4.7.2 (Nonlinear data fitting)
Inhibited enzyme reactions often follow what are known as Michaelis–Menten kinetics, in which a reaction rate w w w follows a law of the form
w ( s ) = V s K m + s , w(s) = \frac{V s}{K_m + s}, w ( s ) = K m + s V s , where s s s is the concentration of a substrate. The real values V V V and K m K_m K m are parameters that are free to fit to data. For this example, we cook up some artificial data with V = 2 V=2 V = 2 and K m = 1 / 2 K_m=1/2 K m = 1/2 .
Example 4.7.2 m = 25;
s = range(0.05, 6, length=m)
ŵ = @. 2 * s / (0.5 + s) # exactly on the curve
w = @. ŵ + 0.15 * cos(2 * exp(s / 16) * s); # smooth noise added
scatter(s, w, label="noisy data",
xlabel="s", ylabel="v", leg=:bottomright)
plot!(s, ŵ, l=:dash, color=:black, label="perfect data")
The idea is to pretend that we know nothing of the origins of this data and use nonlinear least squares to find the parameters in the theoretical model function v ( s ) v(s) v ( s ) . In (4.7.2) , the s s s variable plays the role of t t t , and v v v plays the role of g g g .
Tip
Putting comma-separated values on the left of an assignment will destructure the right-hand side, drawing individual assignments from entries of a vector, for example.
function misfit(x)
V, Km = x # rename components for clarity
return @. V * s / (Km + s) - w
end
misfit (generic function with 1 method)
In the Jacobian the derivatives are with respect to the parameters in x \mathbf{x} x .
function misfitjac(x)
V, Km = x # rename components for clarity
J = zeros(m, 2)
J[:, 1] = @. s / (Km + s) # dw/dV
J[:, 2] = @. -V * s / (Km + s)^2 # dw/d_Km
return J
end
misfitjac (generic function with 1 method)
x₁ = [1, 0.75]
x = FNC.newtonsys(misfit, misfitjac, x₁)
@show V, Km = x[end] # final values
(V, Km) = x[end] = [1.968652598378229, 0.4693037307416775]
2-element Vector{Float64}:
1.968652598378229
0.4693037307416775
The final values are reasonably close to the values V = 2 V=2 V = 2 , K m = 0.5 K_m=0.5 K m = 0.5 that we used to generate the noise-free data. Graphically, the model looks close to the original data.
model(s) = V * s / (Km + s)
plot!(model, 0, 6, label="nonlinear fit")
For this particular model, we also have the option of linearizing the fit process. Rewrite the model as
1 w = α s + β = α ⋅ s − 1 + β \frac{1}{w} = \frac{\alpha}{s} + \beta = \alpha \cdot s^{-1} + \beta w 1 = s α + β = α ⋅ s − 1 + β for the new fitting parameters α = K m / V \alpha=K_m/V α = K m / V and β = 1 / V \beta=1/V β = 1/ V . This corresponds to the misfit function whose entries are
f i ( [ α , β ] ) = ( α ⋅ 1 s i + β ) − 1 w i f_i([\alpha,\beta]) = \left(\alpha \cdot \frac{1}{s_i} + \beta\right) - \frac{1}{w_i} f i ([ α , β ]) = ( α ⋅ s i 1 + β ) − w i 1 for i = 1 , … , m i=1,\ldots,m i = 1 , … , m . Although this misfit is nonlinear in s s s and w w w , it’s linear in the unknown parameters α and β. This lets us pose and solve it as a linear least-squares problem.
A = [s .^ (-1) s .^ 0]
u = 1 ./ w
α, β = A \ u
2-element Vector{Float64}:
0.12476333709901538
0.5713959100431231
The two fits are different because they do not optimize the same quantities.
linmodel(x) = 1 / (β + α / x)
plot!(linmodel, 0, 6, label="linearized fit")
The truly nonlinear fit is clearly better in this case. It optimizes a residual for the original measured quantity rather than a transformed one we picked for algorithmic convenience.
Example 4.7.2 m = 25; V = 2; Km = 0.5;
s = linspace(0.05, 6, m)';
w = V * s ./ (Km + s); % exactly on the curve
w = w + 0.15 * cos(2 * exp(s / 16) .* s); % noise added
clf, fplot(@(s) V * s ./ (Km + s), [0, 6], '--')
hold on, scatter(s, w)
xlabel('concentration'), ylabel('reaction rate')
labels = ["ideal", "noisy data"];
legend(labels, 'location', 'east');
The idea is to pretend that we know nothing of the origins of this data and use nonlinear least squares to find the parameters in the theoretical model function v ( s ) v(s) v ( s ) . In (4.7.2) , the s s s variable plays the role of t t t , and v v v plays the role of g g g . In the Jacobian, the derivatives are with respect to the parameters in x \mathbf{x} x .
function [f, J] = misfit(c, s, w)
V = c(1); Km = c(2);
f = V * s ./ (Km + s) - w;
J(:,1) = s ./ (Km + s); % d/d(V)
J(:,2) = -V * s ./ (Km + s).^2; % d/d(Km)
end
The misfit function above has to know the parameters x
that are being optimized as well as the data s
and w
that remain fixed. We use a closure to pass the data values along.
f = @(x) f47_misfit(x, s, w);
Now we have a function that accepts a single 2-vector input and returns a 25-vector output. We can pass this function to levenberg
to find the best-fit parameters.
x1 = [1; 0.75];
x = newtonsys(f, x1);
V = x(1, end), Km = x(2, end) % final values
model = @(s) V * s ./ (Km + s); % best-fit model
The final values are reasonably close to the values V = 2 V=2 V = 2 , K m = 0.5 K_m=0.5 K m = 0.5 that we used to generate the noise-free data. Graphically, the model looks close to the original data:
final_misfit_norm = norm(model(s) - w)
hold on, fplot(model, [0, 6])
title('Michaelis-Menten fitting')
labels = [labels, "nonlinear fit"];
legend(labels, 'location', 'east');
For this particular model, we also have the option of linearizing the fit process. Rewrite the model as
1 w = α s + β = α ⋅ s − 1 + β \frac{1}{w} = \frac{\alpha}{s} + \beta = \alpha \cdot s^{-1} + \beta w 1 = s α + β = α ⋅ s − 1 + β for the new fitting parameters α = K m / V \alpha=K_m/V α = K m / V and β = 1 / V \beta=1/V β = 1/ V . This corresponds to the misfit function whose entries are
f i ( [ α , β ] ) = ( α ⋅ 1 s i + β ) − 1 w i f_i([\alpha,\beta]) = \left(\alpha \cdot \frac{1}{s_i} + \beta\right) - \frac{1}{w_i} f i ([ α , β ]) = ( α ⋅ s i 1 + β ) − w i 1 for i = 1 , … , m i=1,\ldots,m i = 1 , … , m . Although this misfit is nonlinear in s s s and w w w , it’s linear in the unknown parameters α and β. This lets us pose and solve it as a linear least-squares problem.
u = 1 ./ w;
A = [s.^(-1), s.^0];
z = A \ u;
alpha = z(1); beta = z(2);
The two fits are different because they do not optimize the same quantities.
linmodel = @(s) 1 ./ (beta + alpha ./ s);
final_misfit_linearized = norm(linmodel(s) - w)
fplot(linmodel, [0, 6])
labels = [labels, "linearized fit"];
legend(labels, 'location', 'east');
The truly nonlinear fit is clearly better in this case. It optimizes a residual for the original measured quantity rather than a transformed one we picked for algorithmic convenience.
Example 4.7.2 m = 25
V, Km = 2, 0.5
s = linspace(0.05, 6, m)
model = lambda x: V * x / (Km + x)
w = model(s) + 0.15 * cos(2 * exp(s / 16) * s) # noise added
fig, ax = subplots()
ax.scatter(s, w, label="data")
ax.plot(s, model(s), 'k--', label="unperturbed model")
xlabel("s"), ylabel("w")
legend();
The idea is to pretend that we know nothing of the origins of this data and use nonlinear least squares to find the parameters in the theoretical model function v ( s ) v(s) v ( s ) . In (4.7.2) , the s s s variable plays the role of t t t , and v v v plays the role of g g g .
Tip
Putting comma-separated values on the left of an assignment will destructure the right-hand side, drawing individual assignments from entries of a vector, for example.
def misfit(c):
V, Km = c # rename components for clarity
f = V * s / (Km + s) - w
return f
In the Jacobian the derivatives are with respect to the parameters in x \mathbf{x} x .
def misfitjac(x):
V, Km = x # rename components for clarity
J = zeros([m, 2])
J[:, 0] = s / (Km + s) # d/d(V)
J[:, 1] = -V * s / (Km + s)**2 # d/d(Km)
return J
x1 = [1, 0.75]
x = FNC.newtonsys(misfit, misfitjac, x1)
V, Km = x[-1] # final values
print(f"estimates are V = {V:.3f}, Km = {Km:.3f}")
estimates are V = 1.969, Km = 0.469
The final values are reasonably close to the values V = 2 V=2 V = 2 , K m = 0.5 K_m=0.5 K m = 0.5 that we used to generate the noise-free data. Graphically, the model looks close to the original data.
# since V and Km have been updated, model() is too
ax.plot(s, model(s), label="nonlinear fit")
For this particular model, we also have the option of linearizing the fit process. Rewrite the model as
1 w = α s + β = α ⋅ s − 1 + β \frac{1}{w} = \frac{\alpha}{s} + \beta = \alpha \cdot s^{-1} + \beta w 1 = s α + β = α ⋅ s − 1 + β for the new fitting parameters α = K m / V \alpha=K_m/V α = K m / V and β = 1 / V \beta=1/V β = 1/ V . This corresponds to the misfit function whose entries are
f i ( [ α , β ] ) = ( α ⋅ 1 s i + β ) − 1 w i f_i([\alpha,\beta]) = \left(\alpha \cdot \frac{1}{s_i} + \beta\right) - \frac{1}{w_i} f i ([ α , β ]) = ( α ⋅ s i 1 + β ) − w i 1 for i = 1 , … , m i=1,\ldots,m i = 1 , … , m . Although this misfit is nonlinear in s s s and w w w , it’s linear in the unknown parameters α and β. This lets us pose and solve it as a linear least-squares problem.
from numpy.linalg import lstsq
A = array( [[1 / s[i], 1.0] for i in range(len(s))] )
z = lstsq(A, 1 / w, rcond=None)[0]
alpha, beta = z
print("alpha:", alpha, "beta:", beta)
alpha: 0.12476333709901544 beta: 0.571395910043123
The two fits are different; they do not optimize the same quantities.
linmodel = lambda x: 1 / (beta + alpha / x)
ax.plot(s, linmodel(s), label="linear fit")
ax.legend()
fig
The truly nonlinear fit is clearly better in this case. It optimizes a residual for the original measured quantity rather than a transformed one we picked for algorithmic convenience.
4.7.4 Exercises ¶ ✍ Define f ( x ) = [ x − 8 , x 2 − 4 ] \mathbf{f}(x)=[ x-8, \; x^2-4 ] f ( x ) = [ x − 8 , x 2 − 4 ] .
(a) Write out the linear model of f \mathbf{f} f at x = 2 x=2 x = 2 .
(b) Find the estimate produced by one step of the Gauss–Newton method, starting at x = 2 x=2 x = 2 .
✍ (Continuation of Exercise 1.) The Gauss–Newton method replaces f ( x ) \mathbf{f}(\mathbf{x}) f ( x ) by a linear model and minimizes the norm of its residual. An alternative is to replace ∥ f ( x ) ∥ 2 2 \| \mathbf{f}(\mathbf{x}) \|_2^2 ∥ f ( x ) ∥ 2 2 by a scalar quadratic model q ( x ) q(\mathbf{x}) q ( x ) and minimize that.
(a) Using f ( x ) = [ x − 8 , x 2 − 4 ] \mathbf{f}(x) = [ x-8, \; x^2-4 ] f ( x ) = [ x − 8 , x 2 − 4 ] , let q ( x ) q(x) q ( x ) be defined by the first three terms in the Taylor series for ∥ f ( x ) ∥ 2 2 \| \mathbf{f}(x) \|_2^2 ∥ f ( x ) ∥ 2 2 at x = 2 x=2 x = 2 .
(b) Find the unique x x x that minimizes q ( x ) q(x) q ( x ) . Is the result the same as the estimate produced by Gauss–Newton?
⌨ A famous result by Kermack and McKendrick in 1927 Kermack et al. (1927) suggests that in epidemics that kill only a small fraction of a susceptible population, the death rate as a function of time is well modeled by
w ′ ( t ) = A sech 2 [ B ( t − C ) ] w'(t) = A \operatorname{sech}^2[B(t-C)] w ′ ( t ) = A sech 2 [ B ( t − C )] for constant values of the parameters A , B , C A,B,C A , B , C . Since the maximum of sech is sech ( 0 ) = 1 \operatorname{sech}(0)=1 sech ( 0 ) = 1 , A A A is the maximum death rate and C C C is the time of peak deaths. You will use this model to fit the deaths per week from plague recorded in Mumbai
during 1906:
5, 10, 17, 22, 30, 50, 51, 90, 120, 180, 292, 395, 445, 775, 780,
700, 698, 880, 925, 800, 578, 400, 350, 202, 105, 65, 55, 40, 30, 20
(a) Use Function 4.6.3 to find the best least-squares fit to the data using the sech 2 \operatorname{sech}^2 sech 2 model. Make a plot of the model fit superimposed on the data.
(b) Repeat part (a) using only the first 15 data values.
⌨ (Variation on Exercise 4.5.6 .) Suppose the points ( x i , y i ) (x_i,y_i) ( x i , y i ) for i = 1 , … , m i=1,\ldots,m i = 1 , … , m are given, and the goal is to find the circle with center ( a , b ) (a,b) ( a , b ) and radius r r r that best fits the points. Define
f i ( a , b , r ) = ( a − x i ) 2 + ( b − y i ) 2 − r 2 , i = 1 , … , m . f_i(a,b,r) = (a-x_i)^2 + (b-y_i)^2 - r^2, \qquad i=1,\ldots,m. f i ( a , b , r ) = ( a − x i ) 2 + ( b − y i ) 2 − r 2 , i = 1 , … , m . Then we can define the best circle as the one that minimizes ∥ f ∥ \|\mathbf{f}\| ∥ f ∥ .
Define data points as follows:
m = 30; t = 2π*rand(m);
x = @. -2 + 5*cos(t); y = @. 1 + 5*sin(t);
x += 0.2*randn(m); y += 0.2*randn(m);
Use Function 4.6.3 to find the best-fit circle, and make a plot of the circle superimposed on the points.
⌨ The position of the upper lid during an eye blink can be measured from high-speed video Wu et al. (2014) , and it may be possible to classify blinks based in part on fits to the lid position Brosch et al. (2017) . The lid position functions proposed to fit blinks is a product of a monomial or polynomial multiplying a decaying exponential Berke & Mueller (1998) . In this problem, you will generate representative data, add a small amount of noise to it, and then perform nonlinear least-squares fits to the data.
(a) Consider the function y ( a ) = a 1 t 2 exp ( − a 2 t a 3 ) y(\mathbf{a}) = a_1 t^2 \exp \left( -a_2 t^{a_3} \right) y ( a ) = a 1 t 2 exp ( − a 2 t a 3 ) , using the vector of coefficients a = [ a 1 , a 2 , a 3 ] \mathbf{a} = [a_1,a_2,a_3] a = [ a 1 , a 2 , a 3 ] , and create synthetic eyelid position data as follows:
N = 20; # number of time values
t = (1:N)/N; # equally spaced to t=1
a = [10, 10, 2]; # baseline values
y = @. a(1)*t^2*exp(-a(2)*t^a(3)); # ideal data
ym = copy(y); # vector for data
ir = 1:N-1; # range to add noise
noise = 0.03; # amplitude of noise
ym[ir] += noise*rand(N-1); # add noise
(b) Using the data (t,ym)
, find the nonlinear least-squares fit using Function 4.6.3 .
(c) Plot the fits using np = 100
points over t=(1:np)/np
together with symbols for the N
measured data points ym
.
(d) Increase the noise to 5% and 10%. You may have to increase the number of measured points N
and/or the maximum number of iterations. How close are the coefficients? Plot the data and the resulting fit for each case.
⌨ Repeat the previous problem using the fitting function y ( a ) = ( a 1 + a 2 t + a 3 t 2 ) t 2 exp ( − a 4 t a 5 ) y(\mathbf{a}) = (a_1+a_2 t + a_3 t^2) t^2 \exp \left( -a_4 t^{a_5} \right) y ( a ) = ( a 1 + a 2 t + a 3 t 2 ) t 2 exp ( − a 4 t a 5 ) , using the vector of coefficients a = [ a 1 , … , a 5 ] \mathbf{a} = [a_1,\ldots,a_5] a = [ a 1 , … , a 5 ] . (This was the choice used in Brosch et al Brosch et al. (2017) .) Use a = [20, -10, -8, 7, 2]
to create the data and as an initial guess for the coefficients for the fit to the noisy data.