Skip to article frontmatterSkip to article content

Newton for nonlinear systems

The rootfinding problem becomes much more difficult when multiple variables and equations are involved.

Particular problems are often posed using scalar variables and equations.

While the equations of Example 4.5.1 are easy to solve by hand, in practice even establishing the existence and uniqueness of solutions for any particular system is typically quite difficult.

4.5.1Linear model

To extend rootfinding methods to systems, we will keep to the basic philosophy of constructing easily managed models of the exact function. As usual, the starting point is a linear model. We first need to define what it means to take a derivative of a vector-valued function of a vector variable.

A multidimensional Taylor series begins with the linear approximation

f(x+h)=f(x)+J(x)h+O(h2),\mathbf{f}(\mathbf{x}+\mathbf{h}) = \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x})\mathbf{h} + O(\| \mathbf{h} \|^2),

where J\mathbf{J} is the Jacobian matrix of f\mathbf{f} at x\mathbf{x}, and h\mathbf{h} is a small perturbation from x\mathbf{x}.

The terms f(x)+J(x)h\mathbf{f}(\mathbf{x})+\mathbf{J}(\mathbf{x})\mathbf{h} in (4.5.7) represent the linear part of f\mathbf{f} near x\mathbf{x}, while the O(h2)O(\| \mathbf{h} \|^2) term represents the omitted higher-order terms in the Taylor series. If f\mathbf{f} is actually linear, i.e., f(x)=Axb\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}-\mathbf{b}, then the Jacobian matrix is the constant matrix A\mathbf{A} and the higher-order terms in (4.5.7) disappear.

4.5.2The multidimensional Newton iteration

With a method in hand for constructing a linear model for the vector system f(x)\mathbf{f}(\mathbf{x}), we can generalize Newton’s method. Specifically, at a root estimate xk\mathbf{x}_k, we set h=xxk\mathbf{h} = \mathbf{x}-\mathbf{x}_k in (4.5.7) and get

f(x)q(x)=f(xk)+J(xk)(xxk).\mathbf{f}(\mathbf{x}) \approx \mathbf{q}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}-\mathbf{x}_k).

We define the next iteration value xk+1\mathbf{x}_{k+1} by requiring q(xk+1)=0\mathbf{q}(\mathbf{x}_{k+1})=\boldsymbol{0},

0=f(xk)+J(xk)(xk+1xk),\begin{split} \boldsymbol{0} &= \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}_{k+1}-\mathbf{x}_k),\\ \end{split}

which can be rearranged into

xk+1=xk[J(xk)]1f(xk).\mathbf{x}_{k+1} = \mathbf{x}_k - \bigl[\mathbf{J}(\mathbf{x}_k)\bigr]^{-1} \mathbf{f}(\mathbf{x}_k).

Note that J1f\mathbf{J}^{-1}\mathbf{f} now plays the role that f/ff/f' had in the scalar case; in fact, the two are the same in one dimension. In computational practice, however, we don’t compute matrix inverses.

An extension of our series analysis of the scalar Newton’s method shows that the vector version is also quadratically convergent in any vector norm, under suitable circumstances and when the iteration converges at all.

4.5.3Implementation

An implementation of Newton’s method for systems is given in Function 4.5.2. Other than computing the Newton step using backslash and taking vector magnitudes with norm, Function 4.5.2 is virtually identical to the scalar version Function 4.3.1 presented earlier.

4.5.4Exercises