Skip to article frontmatterSkip to article content

Fitting functions to data

In Polynomial interpolation we saw how a polynomial can be used to interpolate data—that is, derive a continuous function that evaluates to give a set of prescribed values. But interpolation may not be appropriate in many applications.

In many cases we can get better results by relaxing the interpolation requirement. In the polynomial case this allows us to lower the degree of the polynomial, which limits the number of local max and min points. Let (ti,yi)(t_i,y_i) for i=1,,mi=1,\ldots,m be the given points. We will represent the data by the polynomial

yf(t)=c1+c2t++cn1tn2+cntn1,y \approx f(t) = c_1 + c_2t + \cdots + c_{n-1} t^{n-2} + c_n t^{n-1},

with n<mn<m. Just as in (2.1.3), we can express a vector of ff-values by a matrix-vector multiplication. In other words, we seek an approximation

[y1y2y3ym][f(t1)f(t2)f(t3)f(tm)]=[1t1t1n11t2t2n11t3t3n11tmtmn1][c1c2cn]=Vc.\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_m \end{bmatrix} \approx \begin{bmatrix} f(t_1) \\ f(t_2) \\ f(t_3) \\ \vdots \\ f(t_m) \end{bmatrix} = \begin{bmatrix} 1 & t_1 & \cdots & t_1^{n-1} \\ 1 & t_2 & \cdots & t_2^{n-1} \\ 1 & t_3 & \cdots & t_3^{n-1} \\ \vdots & \vdots & & \vdots \\ 1 & t_m & \cdots & t_m^{n-1} \\ \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} = \mathbf{V} \mathbf{c}.

Note that V\mathbf{V} has the same structure as the Vandermonde matrix in (2.1.3) but is m×nm\times n, thus taller than it is wide. It’s impossible in general to satisfy mm conditions with n<mn<m variables, and we say the system is overdetermined. Rather than solving the system exactly, we have to find the best approximation. Below we specify precisely what is meant by this, but first we note that Julia uses the same backslash notation to solve the problem in both the square and overdetermined cases.

3.1.1The least-squares formulation

In the most general terms, our fitting functions take the form

f(t)=c1f1(t)++cnfn(t)f(t) = c_1 f_1(t) + \cdots + c_n f_n(t)

where f1,,fnf_1,\ldots,f_n are all known functions with no undetermined parameters. This leaves only c1,,cnc_1,\ldots,c_n to be determined. The essential feature of a linear least-squares problem is that the fit depends only linearly on the unknown parameters. For instance, a function of the form f(t)=c1+c2ec3tf(t)=c_1 + c_2 e^{c_3 t} is not of this type.

At each observation (ti,yi)(t_i,y_i), we define a residual, yif(ti)y_i - f(t_i). A sensible formulation of the fitting criterion is to minimize

R(c1,,cn)=i=1m[yif(ti)]2, R(c_1,\ldots,c_n) = \sum_{i=1}^m\, [ y_i - f(t_i) ]^2,

over all possible choices of parameters c1,,cnc_1,\ldots,c_n. We can apply linear algebra to write the problem in the form R=rTrR=\mathbf{r}^T \mathbf{r}, where

r=[y1y2ym1ym][f1(t1)f2(t1)fn(t1)f1(t2)f2(t2)fn(t2)f1(tm1)f2(tm1)fn(tm1)f1(tm)f2(tm)fn(tm)][c1c2cn].\mathbf{r} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\y_{m-1} \\ y_m \end{bmatrix} - \begin{bmatrix} f_1(t_1) & f_2(t_1) & \cdots & f_n(t_1) \\[1mm] f_1(t_2) & f_2(t_2) & \cdots & f_n(t_2) \\[1mm] & \vdots \\ f_1(t_{m-1}) & f_2(t_{m-1}) & \cdots & f_n(t_{m-1}) \\[1mm] f_1(t_m) & f_2(t_m) & \cdots & f_n(t_m) \\[1mm] \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}.

Recalling that rTr=r22\mathbf{r}^T\mathbf{r}=\| \mathbf{r} \|_2^2, and renaming the variables to standardize the statement, we arrive at the general linear least-squares problem.

While we could choose to minimize in any vector norm, the 2-norm is the most common and convenient choice. For the rest of this chapter we exclusively use the 2-norm. In the edge case m=nm=n for a nonsingular A\mathbf{A}, the definitions of the linear least-squares and linear systems problems coincide: the solution of Ax=b\mathbf{A}\mathbf{x}=\mathbf{b} implies r=0\mathbf{r}=\boldsymbol{0}, which is a global minimum of r220\| \mathbf{r} \|_2^2 \ge 0.

3.1.2Change of variables

The most familiar and common case of a polynomial least-squares fit is the straight line, f(t)=c1+c2tf(t) = c_1 + c_2 t. Certain other fit functions can be transformed into this situation. For example, suppose we want to fit data using g(t)=a1ea2tg(t)= a_1 e^{a_2 t}. Then

logylogg(t)=(loga1)+a2t=c1+c2t.\log y \approx \log g(t) = (\log a_1) + a_2 t = c_1 + c_2 t.

While the fit of the yiy_i to g(t)g(t) is nonlinearly dependent on fitting parameters, the fit of logy\log y to a straight line is a linear problem. Similarly, the power-law relationship yf(t)=a1ta2y\approx f(t)=a_1 t^{a_2} is equivalent to

logy(loga1)+a2(logt).\log y \approx (\log a_1) + a_2 (\log t).

Thus, the variable z=logyz=\log y can be fit linearly in terms of the variable s=logts=\log t. In practice these two cases—exponential fit and power law—are easily detected by using log-linear or log-log plots, respectively.

3.1.3Exercises