Functions¶
Solution of least squares by the normal equations
1 2 3 4 5 6 7 8 9 10 11 12 13 14""" lsnormal(A, b) Solve a linear least-squares problem by the normal equations. Returns the minimizer of ||b-Ax||. """ function lsnormal(A, b) N = A' * A z = A' * b R = cholesky(N).U w = forwardsub(R', z) # solve R'z=c x = backsub(R, w) # solve Rx=z return x end
About the code
The syntax on line 9 is a field reference to extract the matrix we want from the structure returned by cholesky.
Solution of least squares by QR factorization
1 2 3 4 5 6 7 8 9 10 11 12""" lsqrfact(A, b) Solve a linear least-squares problem by QR factorization. Returns the minimizer of ||b-Ax||. """ function lsqrfact(A, b) Q, R = qr(A) c = Matrix(Q)' * b x = backsub(R, c) return x end
QR factorization by Householder reflections
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27""" qrfact(A) QR factorization by Householder reflections. Returns Q and R. """ function qrfact(A) m, n = size(A) Qt = diagm(ones(m)) R = float(copy(A)) for k in 1:n z = R[k:m, k] w = [-sign(z[1]) * norm(z) - z[1]; -z[2:end]] nrmw = norm(w) if nrmw < eps() continue # already in place; skip this iteration end v = w / nrmw # Apply the reflection to each relevant column of R and Q for j in k:n R[k:m, j] -= v * (2 * (v' * R[k:m, j])) end for j in 1:m Qt[k:m, j] -= v * (2 * (v' * Qt[k:m, j])) end end return Qt', triu(R) end
Examples¶
3.1 Fitting functions to data¶
Example 3.1.1
Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).
year = 1955:5:2000
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]
scatter(year, temp, label="data",
xlabel="year", ylabel="anomaly (degrees C)",
legend=:bottomright)A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = @. (year - 1950) / 10
n = length(t)
V = [ t[i]^j for i in 1:n, j in 0:n-1 ]
c = V \ temp10-element Vector{Float64}:
-14.114000001832462
76.36173810552113
-165.45597224550528
191.96056669514388
-133.27347224319684
58.015577787494486
-15.962888891734785
2.6948063497166928
-0.2546666667177082
0.010311111113288083The coefficients in vector c are used to create a polynomial. Then we create a function that evaluates the polynomial after changing the time variable as we did for the Vandermonde matrix.
Tip
If you plot a function, then the points are chosen automatically to make a smooth curve.
using Polynomials, Plots
p = Polynomial(c)
f = yr -> p((yr - 1950) / 10)
plot!(f, 1955, 2000, label="interpolant")As you can see, the interpolant does represent the data, in a sense. However it’s a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.
Example 3.1.2
Here are the 5-year temperature averages again.
year = 1955:5:2000
t = @. (year - 1950) / 10
temp = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ]10-element Vector{Float64}:
-0.048
-0.018
-0.036
-0.012
-0.004
0.118
0.21
0.332
0.334
0.456The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.
Tip
Backslash solves overdetermined linear systems in a least-squares sense.
V = [ t.^0 t ] # Vandermonde-ish matrix
@show size(V)
c = V \ temp
p = Polynomial(c)size(V) = (10, 2)
f = yr -> p((yr - 1955) / 10)
scatter(year, temp, label="data",
xlabel="year", ylabel="anomaly (degrees C)", leg=:bottomright)
plot!(f, 1955, 2000, label="linear fit")If we use a global cubic polynomial, the points are fit more closely.
V = [ t[i]^j for i in 1:length(t), j in 0:3 ]
@show size(V);size(V) = (10, 4)
Now we solve the new least-squares problem to redefine the fitting polynomial.
Tip
The definition of f above is in terms of p. When p is changed, then f calls the new version.
p = Polynomial( V \ temp )
plot!(f, 1955, 2000, label="cubic fit")If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Example 3.1.3
a = [1/k^2 for k=1:100]
s = cumsum(a) # cumulative summation
p = @. sqrt(6*s)
scatter(1:100, p;
title="Sequence convergence",
xlabel=L"k", ylabel=L"p_k")This graph suggests that maybe , but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, . By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ϵ = @. abs(π - p) # error sequence
scatter(1:100, ϵ;
title="Convergence of errors",
xaxis=(:log10,L"k"), yaxis=(:log10,"error"))The straight line on the log-log scale suggests a power-law relationship where , or .
k = 1:100
V = [ k.^0 log.(k) ] # fitting matrix
c = V \ log.(ϵ) # coefficients of linear fit2-element Vector{Float64}:
-0.1823752497282998
-0.9674103233127929In terms of the parameters and used above, we have
a, b = exp(c[1]), c[2];
@show b;b = -0.9674103233127929
It’s tempting to conjecture that the slope asymptotically. Here is how the numerical fit compares to the original convergence curve.
plot!(k, a * k.^b, l=:dash, label="power-law fit")3.2 The normal equations¶
Example 3.2.1
Because the functions , , and 1 are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.
Tip
The local variable scoping rule for loops applies to comprehensions as well.
t = range(0, 3, 400)
f = [ x -> sin(x)^2, x -> cos((1 + 1e-7) * x)^2, x -> 1. ]
A = [ f(t) for t in t, f in f ]
@show κ = cond(A);κ = cond(A) = 1.8253225428206295e7
Now we set up an artificial linear least-squares problem with a known exact solution that actually makes the residual zero.
x = [1., 2, 1]
b = A * x;Using backslash to find the least-squares solution, we get a relative error that is well below times machine epsilon.
x_BS = A \ b
@show observed_error = norm(x_BS - x) / norm(x);
@show error_bound = κ * eps();observed_error = norm(x_BS - x) / norm(x) = 1.7978982202463188e-10
error_bound = κ * eps() = 4.053030228813602e-9
If we formulate and solve via the normal equations, we get a much larger relative error. With , we may not be left with more than about 2 accurate digits.
N = A' * A
x_NE = N \ (A'*b)
@show observed_err = norm(x_NE - x) / norm(x);
@show digits = -log10(observed_err);observed_err = norm(x_NE - x) / norm(x) = 0.036205099396071284
digits = -(log10(observed_err)) = 1.441230255886605
3.3 The QR factorization¶
Example 3.3.1
Julia provides access to both the thin and full forms of the QR factorization.
A = rand(1.:9., 6, 4)
@show m,n = size(A);(m, n) = size(A) = (6, 4)
Here is a standard call:
Q,R = qr(A);
Q6×6 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}R4×4 Matrix{Float64}:
-8.18535 -7.33017 -8.42969 -12.2169
0.0 -11.0122 -4.55939 -3.67299
0.0 0.0 -5.92893 -3.24988
0.0 0.0 0.0 1.92189If you look carefully, you see that we seemingly got a full but a thin . However, the above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.
Tip
To enter the accented character Q̂, type Q\hat followed by Tab.
Q̂ = Matrix(Q)6×4 Matrix{Float64}:
-0.366508 -0.210079 0.176657 0.689362
-0.122169 -0.463529 -0.144501 0.174469
-0.488678 -0.0379498 -0.119342 0.261521
-0.244339 -0.200592 -0.847661 -0.248005
-0.122169 -0.735954 0.402324 -0.462142
-0.733017 0.397117 0.230816 -0.388415We can test that is an orthogonal matrix:
opnorm(Q' * Q - I)1.4148859525188886e-15The thin cannot be an orthogonal matrix, because it is not square, but it is still ONC:
Q̂' * Q̂ - I4×4 Matrix{Float64}:
4.44089e-16 9.04632e-17 -7.89136e-17 -2.88823e-16
9.04632e-17 -2.22045e-16 -8.61465e-17 -4.80516e-17
-7.89136e-17 -8.61465e-17 -7.77156e-16 2.08047e-16
-2.88823e-16 -4.80516e-17 2.08047e-16 2.22045e-16Example 3.3.2
We’ll repeat the experiment of Example 3.2.1, which exposed instability in the normal equations.
t = range(0, 3, 400)
f = [ x -> sin(x)^2, x -> cos((1 + 1e-7) * x)^2, x -> 1. ]
A = [ f(t) for t in t, f in f ]
x = [1., 2, 1]
b = A * x;The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.
observed_error = norm(FNC.lsqrfact(A, b) - x) / norm(x);
@show observed_error;
@show error_bound = cond(A) * eps();observed_error = 5.401298095140833e-10
error_bound = cond(A) * eps() = 4.053030228813602e-9
3.4 Computing QR factorizations¶
Example 3.4.1
We will use Householder reflections to produce a QR factorization of a random matrix.
Tip
The rand function can select randomly from within the interval , or from a vector or range that you specify.
A = rand(float(1:9), 6, 4)
m,n = size(A)(6, 4)Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).
Tip
I can stand for an identity matrix of any size, inferred from the context when needed.
z = A[:, 1];
v = normalize(z - norm(z) * [1; zeros(m-1)])
P₁ = I - 2v * v' # reflector6×6 Matrix{Float64}:
0.203069 0.609208 0.203069 0.101535 0.609208 0.406138
0.609208 0.534296 -0.155235 -0.0776174 -0.465704 -0.310469
0.203069 -0.155235 0.948255 -0.0258725 -0.155235 -0.10349
0.101535 -0.0776174 -0.0258725 0.987064 -0.0776174 -0.0517449
0.609208 -0.465704 -0.155235 -0.0776174 0.534296 -0.310469
0.406138 -0.310469 -0.10349 -0.0517449 -0.310469 0.79302We check that this reflector introduces zeros as it should:
P₁ * z6-element Vector{Float64}:
9.848857801796106
8.881784197001252e-16
1.6653345369377348e-16
8.326672684688674e-17
2.220446049250313e-16
0.0Now we replace by .
A = P₁ * A6×4 Matrix{Float64}:
9.84886 13.301 10.4581 9.95039
8.88178e-16 -0.874548 1.12095 0.451268
1.66533e-16 1.37515 7.37365 4.48376
8.32667e-17 3.68758 2.68683 3.24188
2.22045e-16 1.12545 3.12095 -0.548732
0.0 -1.2497 3.7473 1.96751We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.
z = A[2:m, 2]
v = normalize(z - norm(z) * [1; zeros(m-2)])
P₂ = I - 2v * v'5×5 Matrix{Float64}:
-0.200201 0.314799 0.844158 0.257638 -0.28608
0.314799 0.917432 -0.221413 -0.0675754 0.0750355
0.844158 -0.221413 0.406264 -0.181209 0.201214
0.257638 -0.0675754 -0.181209 0.944695 0.0614106
-0.28608 0.0750355 0.201214 0.0614106 0.93181We now apply this reflector to rows 2 and below only.
A[2:m, :] = P₂ * A[2:m, :]
A6×4 Matrix{Float64}:
9.84886 13.301 10.4581 9.95039
2.10758e-18 4.36835 4.09695 3.35355
3.98939e-16 -2.40362e-16 6.59308 3.72252
7.06482e-16 -6.42647e-16 0.593665 1.20057
4.12251e-16 -1.15338e-16 2.48212 -1.17174
-2.11204e-16 3.13734e-18 4.45666 2.6593We need to iterate the process for the last two columns.
for j in 3:n
z = A[j:m, j]
v = normalize(z - norm(z) * [1; zeros(m-j)])
P = I - 2v * v'
A[j:m, :] = P * A[j:m, :]
endWe have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:
R = triu(A)6×4 Matrix{Float64}:
9.84886 13.301 10.4581 9.95039
0.0 4.36835 4.09695 3.35355
0.0 0.0 8.35726 4.09211
0.0 0.0 0.0 2.64538
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0