Computing QR factorizations - Fundamentals of Numerical Computation

It is possible to compute a thin QR factorization using the outer product formula (2.4.4), as we did with LU. However, to stably compute the factorization, a better strategy is to introduce zeros into the lower triangle, one column at a time, using orthogonal matrices. Thanks to Theorem 3.3.2, the product of orthogonal matrices will also be orthogonal.

3.4.1Householder reflections¶

We will use a particular type of orthogonal matrix.

In Exercise 3.4.2 you are asked to show that such a $\mathbf{P}$ is necessarily orthogonal. Note that for any vector $\mathbf{x}$ of appropriate dimension,

\mathbf{P}\mathbf{x} = \mathbf{x} - 2 \mathbf{v} (\mathbf{v}^T\mathbf{x}).

(3.4.2)

The reason $\mathbf{P}$ is called a reflector is sketched in Figure 3.4.1.

A Householder reflector. Because \mathbf{v} is a unit vector, \mathbf{v}^T\mathbf{x} is the component of \mathbf{x} in the direction of \mathbf{v}. Hence subtracting (\mathbf{v}^T\mathbf{x})\mathbf{v} projects \mathbf{x} into a hyperplane orthogonal to \mathbf{v}. By subtracting off twice as much, we get the reflection of \mathbf{x} through the hyperplane instead. — Figure 3.4.1:A Householder reflector. Because $\mathbf{v}$ is a unit vector, $\mathbf{v}^T\mathbf{x}$ is the component of $\mathbf{x}$ in the direction of $\mathbf{v}$ . Hence subtracting $(\mathbf{v}^T\mathbf{x})\mathbf{v}$ projects $\mathbf{x}$ into a hyperplane orthogonal to $\mathbf{v}$ . By subtracting off twice as much, we get the reflection of $\mathbf{x}$ through the hyperplane instead.

Given a vector $\mathbf{z}$ , we can choose $\mathbf{v}$ so that $\mathbf{P}$ reflects $\mathbf{z}$ onto the $x_1$ -axis—i.e., so that $\mathbf{P}\mathbf{z}$ is nonzero only in the first element. Because orthogonal matrices preserve the 2-norm, we must have

\mathbf{P}\mathbf{z} = \begin{bmatrix} \pm \| \mathbf{z} \|\\0 \\ \vdots \\ 0 \end{bmatrix} = \pm \| \mathbf{z} \| \mathbf{e}_1.

(3.4.3)

(Recall that $\mathbf{e}_k$ is the $k$ th column of the identity matrix.) We choose the positive sign above for our discussion, but see Function 3.4.1 and Exercise 3.4.4 for important computational details. Let

\mathbf{w} = \| \mathbf{z} \| \mathbf{e}_1-\mathbf{z}, \quad \mathbf{v} = \frac{\mathbf{w}}{\|\mathbf{w}\|}.

(3.4.4)

If it turns out that $\mathbf{w}=\boldsymbol{0}$ , then $\mathbf{z}$ is already in the target form and we can take $\mathbf{P}=\mathbf{I}$ . Otherwise, we have the following.

Proof

The proofs of symmetry and orthogonality are left to Exercise 3.4.2. For the last fact, we use (3.4.2) to compute

\mathbf{P}\mathbf{z} = \mathbf{z} - 2 \frac{\mathbf{w}^T \mathbf{z}}{\mathbf{w}^T\mathbf{w}} \mathbf{w}.

(3.4.5)

Since $\mathbf{e}_1^T\mathbf{z}=z_1$ ,

\begin{split} \mathbf{w}^T\mathbf{w} &= \| \mathbf{z} \|^2 - 2 \| \mathbf{z} \| z_1 + \mathbf{z}^T\mathbf{z} = 2\| \mathbf{z} \|(\| \mathbf{z} \|-z_1),\\ \mathbf{w}^T\mathbf{z} &= \| \mathbf{z} \|z_1 - \mathbf{z}^T\mathbf{z} = -\| \mathbf{z} \|\bigl(\| \mathbf{z} \|-z_1\bigr), \end{split}

(3.4.6)

leading finally to

\mathbf{P}\mathbf{z} = \mathbf{z} - 2\cdot \frac{-\| \mathbf{z} \| \bigl(\| \mathbf{z} \|-z_1\bigr)}{2\| \mathbf{z} \| \bigl(\| \mathbf{z} \|-z_1\bigr)} \mathbf{w} = \mathbf{z} + \mathbf{w} = \| \mathbf{z} \|\mathbf{e}_1.

(3.4.7)

3.4.2Factorization algorithm¶

The QR factorization is computed by using successive Householder reflections to introduce zeros in one column at a time. We first show the process for a small numerical example in Example 3.4.1.

Example 3.4.1 (Householder QR factorization)

Julia

MATLAB

Python

Example 3.4.1

We will use Householder reflections to produce a QR factorization of a random matrix.

A = rand(float(1:9), 6, 4)
m,n = size(A)

(6, 4)

Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).

z = A[:, 1];
v = normalize(z - norm(z) * [1; zeros(m-1)])
P₁ = I - 2v * v'   # reflector

6×6 Matrix{Float64}:
 0.455842   0.455842   0.227921   0.341882   0.455842   0.455842
 0.455842   0.61814   -0.19093   -0.286395  -0.38186   -0.38186
 0.227921  -0.19093    0.904535  -0.143198  -0.19093   -0.19093
 0.341882  -0.286395  -0.143198   0.785204  -0.286395  -0.286395
 0.455842  -0.38186   -0.19093   -0.286395   0.61814   -0.38186
 0.455842  -0.38186   -0.19093   -0.286395  -0.38186    0.61814

We check that this reflector introduces zeros as it should:

P₁ * z

6-element Vector{Float64}:
 17.549928774784245
  8.881784197001252e-16
  4.440892098500626e-16
  1.7763568394002505e-15
  8.881784197001252e-16
  8.881784197001252e-16

Now we replace $\mathbf{A}$ by $\mathbf{P}\mathbf{A}$ .

A = P₁ * A

6×4 Matrix{Float64}:
 17.5499       10.3704    12.7636    15.3847
  8.88178e-16   3.66349    4.17183   -2.86157
  4.44089e-16   3.33174    3.58592   -2.93078
  1.77636e-15   0.997617   0.378873   1.10382
  8.88178e-16  -4.33651   -0.82817    0.138431
  8.88178e-16  -3.33651   -2.82817    1.13843

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.

z = A[2:m, 2]
v = normalize(z - norm(z) * [1; zeros(m-2)])
P₂ = I - 2v * v'

5×5 Matrix{Float64}:
  0.491956   0.447407   0.133966  -0.582334  -0.448047
  0.447407   0.605992  -0.117977   0.51283    0.394572
  0.133966  -0.117977   0.964674   0.153556   0.118146
 -0.582334   0.51283    0.153556   0.332514  -0.513564
 -0.448047   0.394572   0.118146  -0.513564   0.604864

We now apply this reflector to rows 2 and below only.

A[2:m, :] = P₂ * A[2:m, :]
A

6×4 Matrix{Float64}:
 17.5499       10.3704       12.7636     15.3847
 -4.15569e-17   7.44678       5.4569     -3.16183
  1.26286e-15  -2.66459e-16   2.45422    -2.66636
  2.02152e-15  -1.48703e-16   0.0400116   1.183
 -1.77509e-16   8.12315e-16   0.644816   -0.205736
  6.82384e-17   2.66214e-16  -1.69485     0.873629

We need to iterate the process for the last two columns.

for j in 3:n
    z = A[j:m, j]
    v = normalize(z - norm(z) * [1; zeros(m-j)])
    P = I - 2v * v'
    A[j:m, :] = P * A[j:m, :]
end

We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:

R = triu(A)

6×4 Matrix{Float64}:
 17.5499  10.3704   12.7636   15.3847
  0.0      7.44678   5.4569   -3.16183
  0.0      0.0       3.05174  -2.65745
  0.0      0.0       0.0       1.50083
  0.0      0.0       0.0       0.0
  0.0      0.0       0.0       0.0

Example 3.4.1

We will use Householder reflections to produce a QR factorization of a matrix.

A = magic(6);
A = A(:, 1:4);
[m, n] = size(A)

Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).

z = A(:, 1);
v = z - norm(z) * eye(m,1);
P_1 = eye(m) - 2 / (v' * v) * (v * v');

We check that this reflector introduces zeros as it should:

P_1 * z

Now we replace $\mathbf{A}$ by $\mathbf{P}_1\mathbf{A}$ .

A = P_1 * A

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.

z = A(2:m, 2);
v = z - norm(z) * eye(m-1, 1);
P_2 = eye(m-1) - 2 / (v' * v) * (v * v');

We now apply this reflector to rows 2 and below only.

A(2:m, 2:n) = P_2 * A(2:m, 2:n)

We need to iterate the process for the last two columns.

for j = 3:n
    z = A(j:m,j);
    k = m-j+1;
    v = z - norm(z) * eye(k, 1);
    P = eye(k) - 2 / (v' * v) * (v * v');
    A(j:m, j:n) = P * A(j:m, j:n);
end

We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:

R = A

Example 3.4.1

We will use Householder reflections to produce a QR factorization of a matrix.

A = 1.0 + floor(9 * random.rand(6,4))
m, n = A.shape
print(A)

[[7. 5. 6. 7.]
 [9. 6. 8. 9.]
 [3. 7. 1. 9.]
 [9. 1. 4. 8.]
 [6. 5. 2. 2.]
 [4. 9. 2. 8.]]

Our first step is to introduce zeros below the diagonal in column 1 by using (3.4.4) and (3.4.1).

z = A[:, 0]
v = z - norm(z) * hstack([1, zeros(m-1)])
P_1 = eye(m) - (2 / dot(v, v)) * outer(v, v)   # reflector

We check that this reflector introduces zeros as it should:

print(P_1 @ z)

[ 1.64924225e+01 -1.33226763e-15  3.33066907e-16 -4.44089210e-16
  0.00000000e+00  0.00000000e+00]

Now we replace $\mathbf{A}$ by $\mathbf{P}_1\mathbf{A}$ .

A = P_1 @ A
print(A)

[[ 1.64924225e+01  1.12172727e+01  1.04896658e+01  1.65530564e+01]
 [-1.33226763e-15  1.05250382e-01  3.74323709e+00 -5.74885027e-02]
 [ 3.33066907e-16  5.03508346e+00 -4.18920970e-01  5.98083717e+00]
 [-4.44089210e-16 -4.89474962e+00 -2.56762911e-01 -1.05748850e+00]
 [ 0.00000000e+00  1.07016692e+00 -8.37841941e-01 -4.03832567e+00]
 [ 0.00000000e+00  6.38011128e+00  1.08105373e-01  3.97444955e+00]]

We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. So we leave it out of the next reflector.

z = A[1:, 1]
v = z - norm(z) * hstack([1, zeros(m-2)])
P_2 = eye(m-1) - (2 / dot(v, v)) * outer(v, v)

We now apply this reflector to rows 2 and below only.

A[1:, 1:] = P_2 @ A[1:, 1:]
print(A)

[[ 1.64924225e+01  1.12172727e+01  1.04896658e+01  1.65530564e+01]
 [-1.33226763e-15  9.54844459e+00 -6.96910549e-02  5.89832746e+00]
 [ 3.33066907e-16 -3.05463033e-16  1.61412113e+00  2.80521356e+00]
 [-4.44089210e-16  6.54521036e-16 -2.23314168e+00  2.02962665e+00]
 [ 0.00000000e+00 -1.85516333e-16 -4.05735020e-01 -4.71327919e+00]
 [ 0.00000000e+00 -6.44032953e-16  2.68423643e+00 -4.94821546e-02]]

We need to iterate the process for the last two columns.

for j in [2, 3]:
    z = A[j:, j]
    v = z - norm(z) * hstack([1, zeros(m-j-1)])
    P = eye(m-j) - (2 / dot(v, v)) * outer(v, v)
    A[j:, j:] = P @ A[j:, j:]

We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:

R = triu(A)
print(R)

[[16.4924225  11.21727266 10.48966578 16.55305641]
 [ 0.          9.54844459 -0.06969105  5.89832746]
 [ 0.          0.          3.86808156  0.45889189]
 [ 0.          0.          0.          5.83056386]
 [ 0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.        ]]

You may be wondering what happened to $\mathbf{Q}$ in Example 3.4.1. Each Householder reflector is orthogonal but not full-size. We have to pad it out to represent algebraically the fact that a block of the first rows is left alone. Given a reflector $\mathbf{P}_k$ that is of square size $m-k+1$ , we define

\mathbf{Q}_k = \begin{bmatrix} \mathbf{I}_{k-1} & \boldsymbol{0} \\ \boldsymbol{0} & \mathbf{P}_k \end{bmatrix}.

(3.4.8)

It is easy to show that $\mathbf{Q}_k$ is also orthogonal. Then the algorithm produces

\mathbf{Q}_n \mathbf{Q}_{n-1}\cdots \mathbf{Q}_1 \mathbf{A} = \mathbf{R}.

(3.4.9)

But $\mathbf{Q}_n \mathbf{Q}_{n-1}\cdots \mathbf{Q}_1$ is orthogonal too, and we multiply on the left by its transpose to get $\mathbf{A}=\mathbf{Q}\mathbf{R}$ , where $\mathbf{Q} = (\mathbf{Q}_n \mathbf{Q}_{n-1}\cdots \mathbf{Q}_1)^T$ . We don’t even need to form these matrices explicitly. Writing

\mathbf{Q}^T = \mathbf{Q}_n \mathbf{Q}_{n-1}\cdots \mathbf{Q}_1 = \mathbf{Q}_n \Bigl( \mathbf{Q}_{n-1}\bigl(\cdots (\mathbf{Q}_1\mathbf{I})\cdots\bigr)\Bigr),

(3.4.10)

we can build $\mathbf{Q}^T$ iteratively by starting with the identity and doing the same row operations as on $\mathbf{A}$ . That process uses much less memory than building the $\mathbf{Q}_k$ matrices explicitly.

The algorithm we have described is encapsulated in Function 3.4.1. There is one more refinement in it, however. As indicated by (3.4.2), the application of a reflector $\mathbf{P}$ to a vector does not require the formation of the matrix $\mathbf{P}$ explicitly.

Algorithm 3.4.1 (qrfact)

Julia

MATLAB

Python

QR factorization by Householder reflections

qrfact.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"""
    qrfact(A)

QR factorization by Householder reflections. Returns Q and R.
"""
function qrfact(A)
    m, n = size(A)
    Qt = diagm(ones(m))
    R = float(copy(A))
    for k in 1:n
        z = R[k:m, k]
        w = [-sign(z[1]) * norm(z) - z[1]; -z[2:end]]
        nrmw = norm(w)
        if nrmw < eps()
            continue    # already in place; skip this iteration
        end
        v = w / nrmw
        # Apply the reflection to each relevant column of R and Q
        for j in k:n
            R[k:m, j] -= v * (2 * (v' * R[k:m, j]))
        end
        for j in 1:m
            Qt[k:m, j] -= v * (2 * (v' * Qt[k:m, j]))
        end
    end
    return Qt', triu(R)
end

QR factorization by Householder reflections

qrfact.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function [Q,R] = qrfact(A)
% QRFACT   QR factorization by Householder reflections.
% (demo only--not efficient)
% Input:
%   A      m-by-n matrix
% Output:
%   Q, R   Q m-by-m orthogonal, R m-by-n upper triangular such that A = QR

    [m, n] = size(A);
    Qt = eye(m);
    for k = 1:n
      z = A(k:m, k);
      w = [ -sign(z(1)) * norm(z) - z(1); -z(2:end) ];
      nrmw = norm(w);
      if nrmw < eps, continue, end    % nothing is done in this iteration
      v = w / nrmw;                   % removes v'*v in other formulas
      % Apply the reflection to each relevant column of A and Q
      for j = 1:n
        A(k:m, j) = A(k:m, j) - v * (2 * (v' * A(k:m, j)));
      end
      for j = 1:m
        Qt(k:m, j) = Qt(k:m, j) - v * (2 * (v' * Qt(k:m, j)));
      end
    end

    Q = Qt';
    R = triu(A);    % enforce exact triangularity 
end

QR factorization by Householder reflections

qrfact.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def qrfact(A):
    """
        qrfact(A)

    QR factorization by Householder reflections. Returns Q and R.
    """
    m, n = A.shape
    Qt = np.eye(m)
    R = np.copy(A)
    for k in range(n):
        z = R[k:, k]
        w = np.hstack((-np.sign(z[0]) * np.linalg.norm(z) - z[0], -z[1:]))
        nrmw = np.linalg.norm(w)
        if nrmw < np.finfo(float).eps: continue    # skip this iteration
        v = w / nrmw
        # Apply the reflection to each relevant column of R and Q
        for j in range(k, n):
            R[k:, j] -= 2 * np.dot(v, R[k:, j]) * v
        for j in range(m):
            Qt[k:, j] -= 2 * np.dot(v, Qt[k:, j]) * v 
    return Qt.T, np.triu(R)

3.4.3Q-less QR and least squares¶

In Example 3.3.1, it was seen that the $\mathbf{Q}$ output of Julia’s qr function is not a standard matrix. The reason is that Equation (3.3.8) shows that in order to solve the linear least-squares problem, all we need from $\mathbf{Q}$ is the computation of $\hat{\mathbf{Q}}^T\mathbf{b}$ . Referring again to (3.4.10) and (3.4.2), the special structure of the reflectors is such that for this computation, we only need to apply code similar to lines 18 and 21 of Function 3.4.1 for each of the Householder vectors $\mathbf{v}$ that is constructed.

This observation leads to the idea of the Q-less QR factorization, in which the full or thin $\mathbf{Q}$ is never computed explicitly. This is the variant used by Julia’s qr. The returned value Q used within Function 3.3.2 is of a special type that allows Julia to perform Q'*b efficiently for the least-squares solution.

In Exercise 3.4.8 you are asked to derive the following result about the Q-less factorization.

The flop count quoted in Theorem 3.4.2 dominates the running time for least-squares solution via QR. Compared to the count from Theorem 3.2.3 for solution by the normal equations, the flops are essentially identical when $m=n$ , but the QR solution is about twice the cost when $m\gg n$ . The redeeming quality of the QR route is better stability, which we do not discuss here.

3.4.4Exercises¶

Exercise 3.4.7

Another algorithmic technique for orthogonally introducing zeros into a matrix is the Givens rotation. Given a 2-vector $[\alpha,\, \beta],$ it defines an angle θ such that

\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix} = \begin{bmatrix} \sqrt{\alpha^2 + \beta^2} \\ 0 \end{bmatrix}.

(a) ✍ Given α and β, show how to compute $\cos \theta$ and $\sin \theta$ without evaluating any trig functions.

(b) ⌨ Given the vector $\mathbf{z}=[1\;2\;3\;4\;5]^T,$ find a sequence of Givens rotations that transforms $\mathbf{z}$ into the vector $\| \mathbf{z} \|\mathbf{e}_1.$ (Hint: You can operate only on pairs of elements at a time, working from the bottom upwards.)