Newton for nonlinear systems - Fundamentals of Numerical Computation

The rootfinding problem becomes much more difficult when multiple variables and equations are involved.

Particular problems are often posed using scalar variables and equations.

While the equations of Example 4.5.1 are easy to solve by hand, in practice even establishing the existence and uniqueness of solutions for any particular system is typically quite difficult.

4.5.1Linear model¶

To extend rootfinding methods to systems, we will keep to the basic philosophy of constructing easily managed models of the exact function. As usual, the starting point is a linear model. We first need to define what it means to take a derivative of a vector-valued function of a vector variable.

Definition 4.5.2 (Jacobian matrix)

The Jacobian matrix of $\mathbf{f}(\mathbf{x})$ , where $\mathbf{f}$ is $m$ -dimensional and $\mathbf{x}$ is $n$ -dimensional, is the $m\times n$ matrix

\mathbf{J}(\mathbf{x}) = \begin{bmatrix} \rule[2mm]{0pt}{1em}\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n}\\[2mm] \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n}\\[1mm] \vdots & \vdots & & \vdots\\[1mm] \rule[-3mm]{0pt}{1em} \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix} = \left[ \frac{\partial f_i}{\partial x_j} \right]_{\,i=1,\ldots,m,\, j=1,\ldots,n.}

(4.5.3)

Example 4.5.2

Let

\begin{split} f_1(x_1,x_2,x_3) &= -x_1\cos(x_2) - 1\\ f_2(x_1,x_2,x_3) &= x_1x_2 + x_3\\ f_3(x_1,x_2,x_3) &= e^{-x_3}\sin(x_1+x_2) + x_1^2 - x_2^2. \end{split}

(4.5.4)

Then

\mathbf{J}(x) = \begin{bmatrix} -\cos(x_2) & x_1 \sin(x_2) & 0\\ x_2 & x_1 & 1\\ e^{-x_3}\cos(x_1+x_2)+2x_1 & e^{-x_3}\cos(x_1+x_2)-2x_2 & -e^{-x_3}\sin(x_1+x_2) \end{bmatrix}.

(4.5.5)

If we were to start writing out the terms in (4.5.7), we would begin with

\begin{split} f_1(x_1+h_1,x_2+h_2,x_3+h_3) &= -x_1\cos(x_2)-1 -\cos(x_2)h_1 + x_1\sin(x_2)h_2 + O\bigl(\| \mathbf{h} \|^2\bigr) \\ f_2(x_1+h_1,x_2+h_2,x_3+h_3) &= x_1x_2 + x_3 + x_2h_1 +x_1h_2 + h_3 + O\bigl(\| \mathbf{h} \|^2\bigr), \end{split}

(4.5.6)

and so on.

A multidimensional Taylor series begins with the linear approximation

\mathbf{f}(\mathbf{x}+\mathbf{h}) = \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x})\mathbf{h} + O(\| \mathbf{h} \|^2),

(4.5.7)

where $\mathbf{J}$ is the Jacobian matrix of $\mathbf{f}$ at $\mathbf{x}$ , and $\mathbf{h}$ is a small perturbation from $\mathbf{x}$ .

The terms $\mathbf{f}(\mathbf{x})+\mathbf{J}(\mathbf{x})\mathbf{h}$ in (4.5.7) represent the linear part of $\mathbf{f}$ near $\mathbf{x}$ , while the $O(\| \mathbf{h} \|^2)$ term represents the omitted higher-order terms in the Taylor series. If $\mathbf{f}$ is actually linear, i.e., $\mathbf{f}(\mathbf{x})=\mathbf{A}\mathbf{x}-\mathbf{b}$ , then the Jacobian matrix is the constant matrix $\mathbf{A}$ and the higher-order terms in (4.5.7) disappear.

4.5.2The multidimensional Newton iteration¶

With a method in hand for constructing a linear model for the vector system $\mathbf{f}(\mathbf{x})$ , we can generalize Newton’s method. Specifically, at a root estimate $\mathbf{x}_k$ , we set $\mathbf{h} = \mathbf{x}-\mathbf{x}_k$ in (4.5.7) and get

\mathbf{f}(\mathbf{x}) \approx \mathbf{q}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}-\mathbf{x}_k).

(4.5.8)

We define the next iteration value $\mathbf{x}_{k+1}$ by requiring $\mathbf{q}(\mathbf{x}_{k+1})=\boldsymbol{0}$ ,

\begin{split} \boldsymbol{0} &= \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}_{k+1}-\mathbf{x}_k),\\ \end{split}

(4.5.9)

which can be rearranged into

\mathbf{x}_{k+1} = \mathbf{x}_k - \bigl[\mathbf{J}(\mathbf{x}_k)\bigr]^{-1} \mathbf{f}(\mathbf{x}_k).

(4.5.10)

Note that $\mathbf{J}^{-1}\mathbf{f}$ now plays the role that $f/f'$ had in the scalar case; in fact, the two are the same in one dimension. In computational practice, however, we don’t compute matrix inverses.

An extension of our series analysis of the scalar Newton’s method shows that the vector version is also quadratically convergent in any vector norm, under suitable circumstances and when the iteration converges at all.

4.5.3Implementation¶

An implementation of Newton’s method for systems is given in Function 4.5.2. Other than computing the Newton step using backslash and taking vector magnitudes with norm, Function 4.5.2 is virtually identical to the scalar version Function 4.3.1 presented earlier.

Algorithm 4.5.2 (newtonsys)

Julia

MATLAB

Python

Newton’s method for systems

newtonsys.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"""
    newtonsys(f, jac, x₁ [;maxiter, ftol, xtol])

Use Newton's method to find a root of a system of equations,
starting from `x₁`. The functions `f` and `jac` should return the
residual vector and the Jacobian matrix, respectively. Returns the
history of root estimates as a vector of vectors.

The optional keyword parameters set the maximum number of iterations
and the stopping tolerance for values of `f` and changes in `x`.

"""
function newtonsys(f, jac, x₁; maxiter = 40, ftol = 1e-13, xtol = 1e-13)
    x = [float(x₁)]
    y, J = f(x₁), jac(x₁)
    Δx = Inf   # for initial pass below
    k = 1

    while (norm(Δx) > xtol) && (norm(y) > ftol)
        Δx = -(J \ y)             # Newton step
        push!(x, x[k] + Δx)    # append to history
        k += 1
        y, J = f(x[k]), jac(x[k])

        if k == maxiter
            @warn "Maximum number of iterations reached."
            break
        end
    end
    return x
end

Newton’s method for systems

newtonsys.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function x = newtonsys(f,x1)
% NEWTONSYS   Newton's method for a system of equations.
% Input:
%   f        function that computes residual and Jacobian matrix
%   x1       initial root approximation (n-vector)
% Output       
%   x        array of approximations (one per column, last is best)

    % Stopping parameters.
    funtol = 1000 * eps;    % stop for small || f(x) ||
    xtol = 1000 * eps;      % stop for small || x change ||
    maxiter = 40;           % stop after this many iterations

    x = x1(:);  
    [y, J] = f(x1);
    dx = Inf;
    k = 1;

    while (norm(dx) > xtol) && (norm(y) > funtol) && (k < maxiter)
        dx = -(J \ y);    % Newton step
        x(:, k+1) = x(:, k) + dx;

        k = k+1;
        [y, J] = f(x(:, k));
    end

    if k==maxiter
        warning('Maximum number of iterations reached.')
    end
end

Newton’s method for systems

newtonsys.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def newtonsys(f, jac, x1):
    """
        newtonsys(f, jac, x1)

    Use Newton's method to find a root of a system of equations, starting from x1. The
    function f should return the residual vector, and the function jac should return 
    the Jacobian matrix. Returns root estimates as a matrix, one estimate per column.
    """
    # Operating parameters.
    funtol = 1000 * np.finfo(float).eps
    xtol = 1000 * np.finfo(float).eps
    maxiter = 40

    x = np.zeros((maxiter, len(x1)))
    x[0] = x1
    y, J = f(x1), jac(x1)
    dx = 10.0  # for initial pass below
    k = 0

    while (norm(dx) > xtol) and (norm(y) > funtol) and (k < maxiter):
        dx = -lstsq(J, y)[0]  # Newton step
        x[k+1] = x[k] + dx

        k = k + 1
        y, J = f(x[k]), jac(x[k])

    if k == maxiter:
        warnings.warn("Maximum number of iterations reached.")
    return x[:k+1]

Example 4.5.3 (Convergence of Newton’s method for systems)

Julia

MATLAB

Python

Example 4.5.3

A system of nonlinear equations is defined by its residual and Jacobian.

function func(x)
    [exp(x[2] - x[1]) - 2,
        x[1] * x[2] + x[3],
        x[2] * x[3] + x[1]^2 - x[2]
    ]
end;

function jac(x)
    [
        -exp(x[2] - x[1]) exp(x[2] - x[1]) 0
        x[2] x[1] 1
        2*x[1] x[3]-1 x[2]
    ]
end;

We will use a BigFloat starting value, and commensurately small stopping tolerances, in order to get a sequence long enough to measure convergence.

x₁ = BigFloat.([0, 0, 0])
ϵ = eps(BigFloat)
x = FNC.newtonsys(func, jac, x₁, xtol=ϵ, ftol=ϵ);

Let’s compute the residual of the last result in order to check the quality.

r = x[end]
@show residual = norm(func(r));

residual = norm(func(r)) = 0.0

We take the sequence of norms of errors, applying the log so that we can look at the exponents.

logerr = [Float64(log(norm(r - x[k]))) for k in 1:length(x)-1]
ratios = [NaN; [logerr[i+1] / logerr[i] for i in 1:length(logerr)-1]]
@pt :header=["iteration", "log error", "ratio"] [eachindex(logerr) logerr ratios]

The ratio is neatly converging toward 2, which is expected for quadratic convergence.

Example 4.5.3

A system of nonlinear equations is defined by its residual and Jacobian.

f45_nlsystem.m

function [f, J] = nlsystem(x)
    f = zeros(3, 1);   % ensure a column vector output
    f(1) = exp(x(2) - x(1)) - 2;
    f(2) = x(1) * x(2) + x(3);
    f(3) = x(2) * x(3) + x(1)^2 - x(2);
    J(1, :) = [-exp(x(2) - x(1)), exp(x(2) - x(1)), 0];
    J(2, :) = [x(2), x(1), 1];
    J(3, :) = [2 * x(1), x(3)-1, x(2)];
end

Since our system function is defined in an external file here, we need to use @ in order to reference it as a function argument.

nlsystem = @f45_nlsystem;
x1 = [0; 0; 0];    % column vector!
x = newtonsys(nlsystem, x1);
num_iter = size(x, 2)

Let’s compute the residual of the last result in order to check the quality.

r = x(:, end)
back_err = norm(nlsystem(r))

We take the sequence errors in the first component of the solution, applying the log so that we can look at the exponents.

log10( abs(x(1, 1:end-1) - r(1)) )'

This sequence looks to be nearly doubling at each iteration, which is a good sign of quadratic convergence.

Example 4.5.3

A system of nonlinear equations is defined by its residual and Jacobian.

def func(x):
    return array([
        exp(x[1] - x[0]) - 2, 
        x[0] * x[1] + x[2], 
        x[1] * x[2] + x[0]**2 - x[1]
    ])

def jac(x):
    return array([
            [-exp(x[1] - x[0]), exp(x[1] - x[0]), 0],
            [x[1], x[0], 1],
            [2 * x[0], x[2] - 1, x[1]],
    ])

Our initial guess at a root is the origin.

x1 = zeros(3)
x = FNC.newtonsys(func, jac, x1)
print(x)

[[ 0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [-1.00000000e+00 -1.20292797e-16  0.00000000e+00]
 [-5.78586294e-01  1.57172588e-01  1.57172588e-01]
 [-4.63138615e-01  2.30903685e-01  1.15452497e-01]
 [-4.58026868e-01  2.35120714e-01  1.07713160e-01]
 [-4.58033281e-01  2.35113900e-01  1.07689991e-01]
 [-4.58033281e-01  2.35113900e-01  1.07689991e-01]]

The output has one row per iteration, so the last row contains the final Newton estimate. Let’s compute its residual.

r = x[-1]
f = func(r)
print("final residual:", f)

final residual: [0.00000000e+00 1.38777878e-17 0.00000000e+00]

Let’s check the convergence rate:

logerr = [log(norm(x[k] - r)) for k in range(x.shape[0] - 1)]
for k in range(len(logerr) - 1):
    print(logerr[k+1] / logerr[k])

0.7937993447128695
3.6959808854483063
2.4326597889977193
2.3110932374366495
2.132541114533629

The ratio is apparently converging toward 2, as expected for quadratic convergence.

4.5.4Exercises¶

Exercise 4.5.4

Two elliptical orbits $(x_1(t),y_1(t))$ and $(x_2(t),y_2(t))$ are described by the equations

\begin{bmatrix} x_1(t) \\ y_1(t) \end{bmatrix} = \begin{bmatrix} -5+10\cos(t) \\ 6\sin(t) \end{bmatrix}, \qquad \begin{bmatrix} x_2(t)\\y_2(t) \end{bmatrix} = \begin{bmatrix} 8\cos(t) \\ 3+12\sin(t) \end{bmatrix},

where $t$ represents time.

(a) ⌨ Make a plot of the two orbits with the following code:

Julia

MATLAB

Python

x1(t) = -5 + 10cos(t);   y1(t) = 6sin(t);
plot(x1, y1, 0, 2pi, aspect_ratio=1, label="Orbit 1")
x2(t) = 8cos(t);   y2(t) = 3 + 12sin(t);
plot!(x2, y2, 0, 2pi, label="Orbit 2")

import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 2*np.pi, 100)
x1 = -5 + 10*np.cos(t);   y1 = 6*np.sin(t)
plt.plot(x1, y1, label='Orbit 1')
x2 = 8*np.cos(t);   y2 = 3 + 12*np.sin(t)
plt.plot(x2, y2, label='Orbit 2')
plt.axis('equal'); plt.grid(True); plt.legend()
plt.show()

(b) ✍ Write out a $2\times 2$ nonlinear system of equations that describes an intersection of these orbits. (Note: An intersection is not the same as a collision—they don’t have to occupy the same point at the same time.)

(c) ✍ Write out the Jacobian matrix of this nonlinear system.

(d) ⌨ Use Function 4.5.2 to find all of the unique intersections. Add them to the plot from part (a).

Exercise 4.5.5

⌨ Suppose one wants to find the points on the ellipsoid $x^2/25 + y^2/16 + z^2/9 = 1$ that are closest to and farthest from the point $(5,4,3)$ . The method of Lagrange multipliers implies that any such point satisfies

\begin{split} x-5 &= \frac{\lambda x}{25}, \\[1mm] y-4 &= \frac{\lambda y}{16}, \\[1mm] z-3 &= \frac{\lambda z}{9}, \\[1mm] 1 &= \frac{1}{25}x^2 + \frac{1}{16}y^2 + \frac{1}{9}z^2 \end{split}

for an unknown value of $\lambda$ .

(a) Write out this system in the form $\mathbf{f}(\mathbf{u}) = \boldsymbol{0}$ . (Note that the system has four variables to go with the four equations.)

(b) Write out the Jacobian matrix of this system.

(c) Use Function 4.5.2 with different initial guesses to find the two roots of this system. Which is the closest point to $(5,4,3)$ , and which is the farthest?

Preface

Interpolation-based methods

Preface

Quasi-Newton methods