Skip to article frontmatterSkip to article content

Chapter 3

Functions

Solution of least squares by the normal equations
lsnormal.m
1
2
3
4
5
6
7
8
9
10
11
12
function x = lsnormal(A,b)
% LSNORMAL   Solve linear least squares by normal equations.
% Input: 
%   A     coefficient matrix (m by n, m>n)
%   b     right-hand side (m by 1)
% Output:
%   x     minimizer of || b-Ax ||

N = A'*A;  z = A'*b;
R = chol(N);
w = forwardsub(R',z);                   % solve R'z=c
x = backsub(R,w);                       % solve Rx=z
Solution of least squares by QR factorization
lsqrfact.m
1
2
3
4
5
6
7
8
9
10
11
function x = lsqrfact(A,b)
% LSQRFACT   Solve linear least squares by QR factorization.
% Input: 
%   A     coefficient matrix (m by n, m>n)
%   b     right-hand side (m by 1)
% Output:
%   x     minimizer of || b-Ax ||

[Q,R] = qr(A,0);                        % compressed factorization
c = Q'*b;
x = backsub(R,c);                       
QR factorization by Householder reflections
qrfact.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function [Q,R] = qrfact(A)
% QRFACT   QR factorization by Householder reflections.
% (demo only--not efficient)
% Input:
%   A      m-by-n matrix
% Output:
%   Q,R    A=QR, Q m-by-m orthogonal, R m-by-n upper triangular

[m,n] = size(A);
Q = eye(m);
for k = 1:n
  z = A(k:m,k);
  v = [ -sign(z(1))*norm(z) - z(1); -z(2:end) ];
  nrmv = norm(v);
  if nrmv < eps, continue, end       % nothing is done in this iteration
  v = v / nrmv;                      % removes v'*v in other formulas
  % Apply the reflection to each relevant column of A and Q
  for j = 1:n
    A(k:m,j) = A(k:m,j) - v*( 2*(v'*A(k:m,j)) );
  end
  for j = 1:m
    Q(k:m,j) = Q(k:m,j) - v*( 2*(v'*Q(k:m,j)) );
  end
end

Q = Q';
R = triu(A);                         % enforce exact triangularity

Examples

3.1 Fitting functions to data

Example 3.1.1

Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951–1980 average (source: NASA).

year = (1955:5:2000)';
y = [ -0.0480; -0.0180; -0.0360; -0.0120; -0.0040;
    0.1180; 0.2100; 0.3320; 0.3340; 0.4560 ];
scatter(year, y), axis tight
xlabel('year')
ylabel(('anomaly ({\circ}C)'));
Image produced in Jupyter

A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.

t = (year - 1950) / 10;  
n = length(t);
V = ones(n, 1);    % t^0
for j = 1:n-1
    V(:, j+1) = t .* V(:,j);
end
c = V \ y;    % solve for coefficients
Error using \
Matrix dimensions must agree.

We created the Vandermonde matrix columns in increasing-degree order. Thus, the coefficients in c also follow that ordering, which is the opposite of what MATLAB uses. We need to flip the coefficients before using them in polyval.

p = @(year) polyval(c(end:-1:1), (year - 1950) / 10);
hold on
fplot(p, [1955, 2000])    % plot the interpolating function
Image produced in Jupyter
Warning: Error in state of SceneNode.
The following error was reported evaluating the function in FunctionLine update: The end operator must be used within an array index expression.
Example 3.1.2

Here are the 5-year temperature averages again.

year = (1955:5:2000)';
y = [ -0.0480; -0.0180; -0.0360; -0.0120; -0.0040;
    0.1180; 0.2100; 0.3320; 0.3340; 0.4560 ];

The standard best-fit line results from using a linear polynomial that meets the least-squares criterion.

t = (year - 1955) / 10;    % better matrix conditioning later
V = [ t.^0 t ];            % Vandermonde-ish matrix
size(V)
Loading...
c = V \ y;
f = @(year) polyval(c(end:-1:1), (year - 1955) / 10);
Error using \
Matrix dimensions must agree.
clf
scatter(year, y), axis tight
xlabel('year'), ylabel('anomaly ({\circ}C)')
hold on
fplot(f, [1955, 2000])
Error using scatter (line 68)
X and Y must be vectors of the same length, matrices of the same size, or a combination of a vector and a matrix where the length of the vector matches either the number of rows or columns of the matrix.

If we use a global cubic polynomial, the points are fit more closely.

V = [t.^0, t.^1, t.^2, t.^3];    % Vandermonde-ish matrix  
size(V)
Loading...

Now we solve the new least-squares problem to redefine the fitting polynomial.

c = V \ y;
f = @(year) polyval(c(end:-1:1), (year - 1955) / 10);
fplot(f, [1955, 2000]) 
legend('data', 'linear', 'cubic', 'Location', 'northwest');
Error using \
Matrix dimensions must agree.

If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.

Example 3.1.3
k = (1:100)';
a = 1./k.^2;      % sequence
s = cumsum(a);    % cumulative summation
p = sqrt(6*s);
clf
plot(k, p, 'o-')
xlabel('k'), ylabel('p_k')
title('Sequence converging to \pi')
Image produced in Jupyter

This graph suggests that maybe pkπp_k\to \pi, but it’s far from clear how close the sequence gets. It’s more informative to plot the sequence of errors, ϵk=πpk\epsilon_k= |\pi-p_k|. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.

ep = abs(pi - p);    % error sequence
loglog(k, ep, 'o')
title('Convergence')
xlabel('k'), ylabel('|p_k - \pi|'), axis tight
Image produced in Jupyter

The straight line on the log-log scale suggests a power-law relationship where ϵkakb\epsilon_k\approx a k^b, or logϵkb(logk)+loga\log \epsilon_k \approx b (\log k) + \log a.

V = [ k.^0, log(k) ];    % fitting matrix
c = V \ log(ep)          % coefficients of linear fit
Error using \
Matrix dimensions must agree.

In terms of the parameters aa and bb used above, we have

a = exp(c(1)),  b = c(2)
Loading...

It’s tempting to conjecture that the slope b1b\to -1 asymptotically. Here is how the numerical fit compares to the original convergence curve.

hold on
loglog(k, a * k.^b)
legend('sequence', 'power-law fit');
Image produced in Jupyter

3.2 The normal equations

Example 3.2.1

Because the functions sin2(t)\sin^2(t), cos2(t)\cos^2(t), and 1 are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.

t = linspace(0, 3, 400)';
A = [ sin(t).^2, cos((1+1e-7)*t).^2, t.^0 ];
kappa = cond(A)
Loading...

Now we set up an artificial linear least-squares problem with a known exact solution that actually makes the residual zero.

x = [1; 2; 1];
b = A * x;
Error using *
Incorrect dimensions for matrix multiplication. Check that the number of columns in the first matrix matches the number of rows in the second matrix. To operate on each element of the matrix individually, use TIMES (.*) for elementwise multiplication.

Using backslash to find the least-squares solution, we get a relative error that is well below κ times machine epsilon.

x_BS = A \ b;
observed_err = norm(x_BS - x) / norm(x)
max_err = kappa * eps
Error using \
Matrix dimensions must agree.

If we formulate and solve via the normal equations, we get a much larger relative error. With κ21014\kappa^2\approx 10^{14}, we may not be left with more than about 2 accurate digits.

N = A'*A;
x_NE = N\(A'*b);
observed_err = norm(x_NE - x) / norm(x)
digits = -log10(observed_err)
Error using -
Arrays have incompatible sizes for this operation.

3.3 The QR factorization

Example 3.3.1

MATLAB provides access to both the thin and full forms of the QR factorization.

A = magic(5);
A = A(:, 1:4);
[m, n] = size(A)
Loading...

Here is the full form:

[Q, R] = qr(A);
szQ = size(Q), szR = size(R)
Loading...

We can test that Q\mathbf{Q} is an orthogonal matrix:

QTQ = Q' * Q
norm(QTQ - eye(m))
Loading...

With a second input argument given to qr, the thin form is returned. (This is usually the one we want in practice.)

[Q_hat, R_hat] = qr(A, 0);
szQ_hat = size(Q_hat), szR_hat = size(R_hat)
Loading...

Now Q^\hat{\mathbf{Q}} cannot be an orthogonal matrix, because it is not square, but it is still ONC. Mathematically, Q^TQ^\hat{\mathbf{Q}}^T \hat{\mathbf{Q}} is a 4×44\times 4 identity matrix.

Q_hat' * Q_hat - eye(n)
Error using eye
Size inputs must be integers.
Example 3.3.2

We’ll repeat the experiment of Demo 3.2.1, which exposed instability in the normal equations.

t = linspace(0, 3, 400)';
A = [ sin(t).^2, cos((1+1e-7)*t).^2, t.^0 ];
x = [1; 2; 1];
b = A * x;

The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.

observed_error = norm(lsqrfact(A, b) - x) / norm(x)
error_bound = cond(A) * eps
Loading...

3.4 Computing QR factorizations

Example 3.4.1

We will use Householder reflections to produce a QR factorization of a matrix.

A = magic(6);
A = A(:, 1:4);
[m, n] = size(A)
Loading...

Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).

z = A(:, 1);
v = z - norm(z) * eye(m,1);
P_1 = eye(m) - 2 / (v' * v) * (v * v');

We check that this reflector introduces zeros as it should:

P_1 * z
Loading...

Now we replace A\mathbf{A} by P1A\mathbf{P}_1\mathbf{A}.

A = P_1 * A
Error using *
Incorrect dimensions for matrix multiplication. Check that the number of columns in the first matrix matches the number of rows in the second matrix. To operate on each element of the matrix individually, use TIMES (.*) for elementwise multiplication.

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.

z = A(2:m, 2);
v = z - norm(z) * eye(m-1, 1);
P_2 = eye(m-1) - 2 / (v' * v) * (v * v');

We now apply this reflector to rows 2 and below only.

A(2:m, 2:n) = P_2 * A(2:m, 2:n)
Index in position 1 exceeds array bounds. Index must not exceed 40.

We need to iterate the process for the last two columns.

for j = 3:n
    z = A(j:m,j);
    k = m-j+1;
    v = z - norm(z) * eye(k, 1);
    P = eye(k) - 2 / (v' * v) * (v * v');
    A(j:m, j:n) = P * A(j:m, j:n);
end
Index in position 1 exceeds array bounds. Index must not exceed 40.

We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:

R = A
Loading...