Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 3

Functions

Solution of least squares by the normal equations
lsnormal.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
"""
    lsnormal(A, b)

Solve a linear least-squares problem by the normal equations.
Returns the minimizer of ||b-Ax||.
"""
function lsnormal(A, b)
    N = A' * A
    z = A' * b
    R = cholesky(N).U
    w = forwardsub(R', z)                   # solve R'z=c
    x = backsub(R, w)                       # solve Rx=z
    return x
end
Solution of least squares by QR factorization
lsqrfact.jl
1
2
3
4
5
6
7
8
9
10
11
12
"""
    lsqrfact(A, b)

Solve a linear least-squares problem by QR factorization. Returns
the minimizer of ||b-Ax||.
"""
function lsqrfact(A, b)
    Q, R = qr(A)
    c = Matrix(Q)' * b
    x = backsub(R, c)
    return x
end
QR factorization by Householder reflections
qrfact.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
    qrfact(A)

QR factorization by Householder reflections. Returns Q and R.
"""
function qrfact(A)
    m, n = size(A)
    Qt = diagm(ones(m))
    R = float(copy(A))
    for k in 1:n
        z = R[k:m, k]
        w = [-sign(z[1]) * norm(z) - z[1]; -z[2:end]]
        nrmw = norm(w)
        if nrmw < eps()
            continue    # already in place; skip this iteration
        end
        v = w / nrmw
        # Apply the reflection to each relevant column of R and Q
        for j in k:n
            R[k:m, j] -= v * (2 * (v' * R[k:m, j]))
        end
        for j in 1:m
            Qt[k:m, j] -= v * (2 * (v' * Qt[k:m, j]))
        end
    end
    return Qt', triu(R)
end

Examples

3.1 Fitting functions to data

Example 3.1.1

Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).

year = 1955:5:2000
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
       0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
    
scatter(year, temp, label="data",
    xlabel="year", ylabel="anomaly (degrees C)", 
    legend=:bottomright)
Loading...

A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.

t = @. (year - 1950) / 10
n = length(t)
V = [ t[i]^j for i in 1:n, j in 0:n-1 ]
c = V \ temp
10-element Vector{Float64}: -14.114000001832462 76.36173810552113 -165.45597224550528 191.96056669514388 -133.27347224319684 58.015577787494486 -15.962888891734785 2.6948063497166928 -0.2546666667177082 0.010311111113288083

The coefficients in vector c are used to create a polynomial. Then we create a function that evaluates the polynomial after changing the time variable as we did for the Vandermonde matrix.

using Polynomials, Plots
p = Polynomial(c)
f = yr -> p((yr - 1950) / 10)
plot!(f, 1955, 2000, label="interpolant")
Loading...

As you can see, the interpolant does represent the data, in a sense. However it’s a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.

Example 3.1.2

Here are the 5-year temperature averages again.

year = 1955:5:2000
t = @. (year - 1950) / 10
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
          0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
10-element Vector{Float64}: -0.048 -0.018 -0.036 -0.012 -0.004 0.118 0.21 0.332 0.334 0.456

The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.

V = [ t.^0 t ]    # Vandermonde-ish matrix
@show size(V)
c = V \ temp
p = Polynomial(c)
size(V) = (10, 2)

Loading...
f = yr -> p((yr - 1955) / 10)
scatter(year, temp, label="data",
    xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
plot!(f, 1955, 2000, label="linear fit")
Loading...

If we use a global cubic polynomial, the points are fit more closely.

V = [ t[i]^j for i in 1:length(t), j in 0:3 ]   
@show size(V);
size(V) = (10, 4)

Now we solve the new least-squares problem to redefine the fitting polynomial.

p = Polynomial( V \ temp )
plot!(f, 1955, 2000, label="cubic fit")
Loading...

If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.

Example 3.1.3
a = [1/k^2 for k=1:100] 
s = cumsum(a)        # cumulative summation
p = @. sqrt(6*s)

scatter(1:100, p;
    title="Sequence convergence",
    xlabel=L"k",  ylabel=L"p_k")
Loading...

This graph suggests that maybe pkπp_k\to \pi, but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, ϵk=πpk\epsilon_k= |\pi-p_k|. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.

ϵ = @. abs(π - p)    # error sequence
scatter(1:100, ϵ;
    title="Convergence of errors",
    xaxis=(:log10,L"k"),  yaxis=(:log10,"error"))
Loading...

The straight line on the log-log scale suggests a power-law relationship where ϵkakb\epsilon_k\approx a k^b, or logϵkb(logk)+loga\log \epsilon_k \approx b (\log k) + \log a.

k = 1:100
V = [ k.^0 log.(k) ]     # fitting matrix
c = V \ log.(ϵ)          # coefficients of linear fit
2-element Vector{Float64}: -0.1823752497282998 -0.9674103233127929

In terms of the parameters aa and bb used above, we have

a, b = exp(c[1]), c[2];
@show b;
b = -0.9674103233127929

It’s tempting to conjecture that the slope b1b\to -1 asymptotically. Here is how the numerical fit compares to the original convergence curve.

plot!(k, a * k.^b, l=:dash, label="power-law fit")
Loading...

3.2 The normal equations

Example 3.2.1

Because the functions sin2(t)\sin^2(t), cos2(t)\cos^2(t), and 1 are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.

t = range(0, 3, 400)
f = [ x -> sin(x)^2, x -> cos((1 + 1e-7) * x)^2, x -> 1. ]
A = [ f(t) for t in t, f in f ]
@show κ = cond(A);
κ = cond(A) = 1.8253225428206295e7

Now we set up an artificial linear least-squares problem with a known exact solution that actually makes the residual zero.

x = [1., 2, 1]
b = A * x;

Using backslash to find the least-squares solution, we get a relative error that is well below κ\kappa times machine epsilon.

x_BS = A \ b
@show observed_error = norm(x_BS - x) / norm(x);
@show error_bound = κ * eps();
observed_error = norm(x_BS - x) / norm(x) = 1.7978982202463188e-10
error_bound = κ * eps() = 4.053030228813602e-9

If we formulate and solve via the normal equations, we get a much larger relative error. With κ21014\kappa^2\approx 10^{14}, we may not be left with more than about 2 accurate digits.

N = A' * A
x_NE = N \ (A'*b)
@show observed_err = norm(x_NE - x) / norm(x);
@show digits = -log10(observed_err);
observed_err = norm(x_NE - x) / norm(x) = 0.036205099396071284
digits = -(log10(observed_err)) = 1.441230255886605

3.3 The QR factorization

Example 3.3.1

Julia provides access to both the thin and full forms of the QR factorization.

A = rand(1.:9., 6, 4)
@show m,n = size(A);
(m, n) = size(A) = (6, 4)

Here is a standard call:

Q,R = qr(A);
Q
6×6 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}
R
4×4 Matrix{Float64}: -13.1909 -10.5376 -14.1764 -7.50517 0.0 -1.39992 -4.01091 -2.79573 0.0 0.0 -6.47621 -1.29551 0.0 0.0 0.0 -5.58372

If you look carefully, you see that we seemingly got a full Q\mathbf{Q} but a thin R\mathbf{R}. However, the Q\mathbf{Q} above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.

Q̂ = Matrix(Q)
6×4 Matrix{Float64}: -0.530669 0.422849 -0.335539 0.400321 -0.530669 -0.291479 0.415688 0.404591 -0.227429 -0.43106 -0.316068 0.236669 -0.530669 0.422849 0.127695 -0.602617 -0.227429 -0.43106 -0.62489 -0.408047 -0.227429 -0.43106 0.455989 -0.300645

We can test that Q\mathbf{Q} is an orthogonal matrix:

opnorm(Q' * Q - I)
5.284739539913396e-16

The thin Q^\hat{\mathbf{Q}} cannot be an orthogonal matrix, because it is not square, but it is still ONC:

Q̂' * Q̂ - I
4×4 Matrix{Float64}: 0.0 1.6454e-16 -1.49328e-16 3.78178e-17 1.6454e-16 -1.11022e-16 1.03859e-16 -8.56754e-17 -1.49328e-16 1.03859e-16 2.22045e-16 3.12408e-17 3.78178e-17 -8.56754e-17 3.12408e-17 2.22045e-16
Example 3.3.2

We’ll repeat the experiment of Example 3.2.1, which exposed instability in the normal equations.

t = range(0, 3, 400)
f = [ x -> sin(x)^2, x -> cos((1 + 1e-7) * x)^2, x -> 1. ]
A = [ f(t) for t in t, f in f ]
x = [1., 2, 1]
b = A * x;

The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.

observed_error = norm(FNC.lsqrfact(A, b) - x) / norm(x);
@show observed_error;
@show error_bound = cond(A) * eps();
observed_error = 4.516380901323573e-10
error_bound = cond(A) * eps() = 4.053030228813602e-9

3.4 Computing QR factorizations

Example 3.4.1

We will use Householder reflections to produce a QR factorization of a random matrix.

A = rand(float(1:9), 6, 4)
m,n = size(A)
(6, 4)

Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).

z = A[:, 1];
v = normalize(z - norm(z) * [1; zeros(m-1)])
P₁ = I - 2v * v'   # reflector
6×6 Matrix{Float64}: 0.336336 0.605406 0.470871 0.336336 0.336336 0.269069 0.605406 0.447739 -0.429537 -0.306812 -0.306812 -0.24545 0.470871 -0.429537 0.665916 -0.238631 -0.238631 -0.190905 0.336336 -0.306812 -0.238631 0.829549 -0.170451 -0.136361 0.336336 -0.306812 -0.238631 -0.170451 0.829549 -0.136361 0.269069 -0.24545 -0.190905 -0.136361 -0.136361 0.890911

We check that this reflector introduces zeros as it should:

P₁ * z
6-element Vector{Float64}: 14.866068747318506 2.886579864025407e-15 8.881784197001252e-16 1.5543122344752192e-15 1.2212453270876722e-15 8.881784197001252e-16

Now we replace A\mathbf{A} by PA\mathbf{P}\mathbf{A}.

A = P₁ * A
6×4 Matrix{Float64}: 14.8661 7.1976 8.07207 11.9063 2.88658e-15 -1.00469 -2.53905 4.34881 8.88178e-16 0.440798 -1.30815 1.93797 1.55431e-15 5.88628 -0.0772508 1.52712 1.22125e-15 2.88628 3.92275 0.527119 8.88178e-16 0.109028 0.538199 2.8217

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.

z = A[2:m, 2]
v = normalize(z - norm(z) * [1; zeros(m-2)])
P₂ = I - 2v * v'
5×5 Matrix{Float64}: -0.151129 0.0663064 0.885435 0.434165 0.0164003 0.0663064 0.996181 -0.0510021 -0.0250084 -0.000944677 0.885435 -0.0510021 0.318933 -0.333955 -0.0126149 0.434165 -0.0250084 -0.333955 0.836248 -0.00618561 0.0164003 -0.000944677 -0.0126149 -0.00618561 0.999766

We now apply this reflector to rows 2 and below only.

A[2:m, :] = P₂ * A[2:m, :]
A
6×4 Matrix{Float64}: 14.8661 7.1976 8.07207 11.9063 1.54368e-15 6.6479 1.94053 1.09857 9.65531e-16 2.60422e-18 -1.56618 2.12519 2.58726e-15 -2.51747e-16 -3.52289 4.02717 1.72774e-15 1.35625e-16 2.23321 1.753 9.07311e-16 -5.0709e-20 0.474378 2.868

We need to iterate the process for the last two columns.

for j in 3:n
    z = A[j:m, j]
    v = normalize(z - norm(z) * [1; zeros(m-j)])
    P = I - 2v * v'
    A[j:m, :] = P * A[j:m, :]
end

We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:

R = triu(A)
6×4 Matrix{Float64}: 14.8661 7.1976 8.07207 11.9063 0.0 6.6479 1.94053 1.09857 0.0 0.0 4.48062 -2.73185 0.0 0.0 0.0 4.95681 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0