Skip to article frontmatterSkip to article content

Nonlinear least squares

After the solution of square linear systems, we generalized to the case of having more constraints to satisfy than available variables. Our next step is to do the same for nonlinear equations, thus filling out this table:

linearnonlinear
squareAx=b\mathbf{A}\mathbf{x}=\mathbf{b}f(x)=0\mathbf{f}(\mathbf{x})=\boldsymbol{0}
overdeterminedminAxb2\min\, \bigl|\mathbf{A}\mathbf{x} - \mathbf{b}\bigr|_2minf(x)2\min\, \bigl|\mathbf{f}(\mathbf{x}) \bigr|_2

As in the linear case, we consider only overdetermined problems, where m>nm>n. Minimizing a positive quantity is equivalent to minimizing its square, so we could also define the result as minimizing ϕ(x)=f(x)Tf(x)\phi(\mathbf{x})=\mathbf{f}(\mathbf{x})^T\mathbf{f}(\mathbf{x}).

4.7.1Gauss–Newton method

You should not be surprised to learn that we can formulate an algorithm by substituting a linear model function for f\mathbf{f}. At a current estimate xk\mathbf{x}_k we define

q(x)=f(xk)+Ak(xxk), \mathbf{q}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_k) + \mathbf{A}_k(\mathbf{x}-\mathbf{x}_k),

where Ak\mathbf{A}_k is the exact m×nm\times n Jacobian matrix, J(xk)\mathbf{J}(\mathbf{x}_k), or an approximation of it as described in Quasi-Newton methods.

In the square case, we solved q=0\mathbf{q}=\boldsymbol{0} to define the new value for x\mathbf{x}, leading to the condition Aksk=fk\mathbf{A}_k\mathbf{s}_k=-\mathbf{f}_k, where sk=xk+1xk\mathbf{s}_k=\mathbf{x}_{k+1}-\mathbf{x}_k. Now, with m>nm>n, we cannot expect to solve q=0\mathbf{q}=\boldsymbol{0}, so instead we define xk+1\mathbf{x}_{k+1} as the value that minimizes q2\| \mathbf{q} \|_2.

In brief, Gauss–Newton solves a series of linear least-squares problems in order to solve a nonlinear least-squares problem.

Surprisingly, Function 4.5.2 and Function 4.6.3, which were introduced for the case of m=nm=n nonlinear equations, work without modification as the Gauss–Newton method for the overdetermined case! The reason is that the backslash operator applies equally well to the linear system and linear least-squares problems, and nothing else in those functions was written with explicit reference to nn.

4.7.2Convergence

In the multidimensional Newton method for a nonlinear system, we expect quadratic convergence to a solution in the typical case. For the Gauss–Newton method, the picture is more complicated.

As always in least-squares problems, the residual f(x)\mathbf{f}(\mathbf{x}) will not necessarily be zero when f\|\mathbf{f}\| is minimized. Suppose that the minimum value of f\|\mathbf{f}\| is R>0R>0. In general, we might observe quadratic-like convergence until the iterate xk\|\mathbf{x}_k\| is within distance RR of a true minimizer, and linear convergence thereafter. When RR is not sufficiently small, the convergence can be quite slow.

4.7.3Nonlinear data fitting

In Fitting functions to data we saw how to fit functions to data values, provided that the set of candidate fitting functions depends linearly on the undetermined coefficients. We now have a tool to generalize that process to fitting functions that depend nonlinearly on unknown parameters.

Suppose that (ti,yi)(t_i,y_i) for i=1,,mi=1,\ldots,m are given points. We wish to model the data by a function g(t,x)g(t,\mathbf{x}) that depends on unknown parameters x1,,xnx_1,\ldots,x_n in an arbitrary way. A standard approach is to minimize the discrepancy between the model and the observations, in a least-squares sense. Define

f(x)=[g(ti,x)yi]i=1,,m.\mathbf{f}(\mathbf{x}) = \left[\, g(t_i,\mathbf{x})-y_i \, \right]_{\,i=1,\ldots,m}.

We call f\mathbf{f} a misfit function. By minimizing f(c)2\bigl\| \mathbf{f}(\mathbf{c}) \bigr\|^2, we get the best possible fit to the data. If an explicit Jacobian matrix is desired for the minimization, we can compute

f(x)=[xjg(ti,x)]i=1,,m;j=1,,n.\mathbf{f}{\,}'(\mathbf{x}) = \left[ \frac{\partial}{\partial x_j} g(t_i,\mathbf{x}) \right]_{\,i=1,\ldots,m;\,j=1,\ldots,n.}

The form of gg is up to the modeler. There may be compelling theoretical choices, or you may just be looking for enough algebraic power to express the data well. Naturally, in the special case where the dependence on c\mathbf{c} is linear, i.e.,

g(t,c)=c1g1(t)+c2g2(t)++cmgm(t), g(t,\mathbf{c}) = c_1 g_1(t) + c_2 g_2(t) + \cdots + c_m g_m(t),

then the misfit function is also linear in c\mathbf{c} and the fitting problem reduces to linear least squares.

4.7.4Exercises

References
  1. Kermack, W. O., McKendrick, A. G., & Walker, G. T. (1927). A Contribution to the Mathematical Theory of Epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 115(772), 700–721. 10.1098/rspa.1927.0118
  2. Wu, Z., Begley, C. G., Situ, P., Simpson, T., & Liu, H. (2014). The Effects of Mild Ocular Surface Stimulation and Concentration on Spontaneous Blink Parameters. Current Eye Research, 39(1), 9–20. 10.3109/02713683.2013.822896
  3. Brosch, J. K., Wu, Z., Begley, C. G., Driscoll, T. A., & Braun, R. J. (2017). Blink Characterization Using Curve Fitting and Clustering Algorithms. Journal for Modeling in Ophthalmology, 1(3), 60–81. 10.35119/maio.v1i3.38
  4. Berke, A., & Mueller, S. (1998). The Kinetics of Lid Motion and Its Effects on the Tear Film. In D. A. Sullivan, D. A. Dartt, & M. A. Meneray (Eds.), Lacrimal Gland, Tear Film, and Dry Eye Syndromes 2 (Vol. 438, pp. 417–424). Springer US. 10.1007/978-1-4615-5359-5_58