The QR factorization Tobin A. Driscoll Richard J. Braun
Sets of vectors satisfying a certain property are useful both theoretically and computationally.
Definition 3.3.1 (Orthogonal vectors)
Two vectors u \mathbf{u} u and v \mathbf{v} v in R n \mathbb{R}^n R n are orthogonal vectors if u T v = 0 \mathbf{u}^T\mathbf{v}=0 u T v = 0 . We say that a collection of vectors q 1 , … , q k \mathbf{q}_1,\ldots,\mathbf{q}_k q 1 , … , q k is orthogonal if
i ≠ j ⇒ q i T q j = 0. i \neq j \quad \Rightarrow \quad \mathbf{q}_i^T\mathbf{q}_j = 0. i = j ⇒ q i T q j = 0. If (3.3.1) applies and also q i T q i = 1 \mathbf{q}_i^T\mathbf{q}_i=1 q i T q i = 1 for all i = 1 , … , n i=1,\ldots,n i = 1 , … , n , we say they are {term} orthonormal vectors
.
In two and three dimensions, orthogonality is the same as perpendicularity.
Orthogonal vectors simplify inner products. For example, if q 1 \mathbf{q}_1 q 1 and q 2 \mathbf{q}_2 q 2 are orthogonal, then
∥ q 1 − q 2 ∥ 2 2 = ( q 1 − q 2 ) T ( q 1 − q 2 ) = q 1 T q 1 − 2 q 1 T q 2 + q 2 T q 2 = ∥ q 1 ∥ 2 2 + ∥ q 2 ∥ 2 2 . \| \mathbf{q}_1 - \mathbf{q}_2 \|_2^2 = (\mathbf{q}_1-\mathbf{q}_2)^T(\mathbf{q}_1-\mathbf{q}_2)
= \mathbf{q}_1^T\mathbf{q}_1 - 2 \mathbf{q}_1^T\mathbf{q}_2 + \mathbf{q}_2^T\mathbf{q}_2
= \|\mathbf{q}_1\|_2^2 + \|\mathbf{q}_2\|_2^2. ∥ q 1 − q 2 ∥ 2 2 = ( q 1 − q 2 ) T ( q 1 − q 2 ) = q 1 T q 1 − 2 q 1 T q 2 + q 2 T q 2 = ∥ q 1 ∥ 2 2 + ∥ q 2 ∥ 2 2 . As in the rest of this chapter, we will be using the 2-norm exclusively.
Equation (3.3.2) is the key to the computational attractiveness of orthogonality. Figure 3.3.1 shows how nonorthogonal vectors can allow a multidimensional version of subtractive cancellation, in which ∥ x − y ∥ \|\mathbf{x}-\mathbf{y}\| ∥ x − y ∥ is much smaller than ∥ x ∥ \|\mathbf{x}\| ∥ x ∥ and ∥ y ∥ \|\mathbf{y}\| ∥ y ∥ . As the figure illustrates, orthogonal vectors do not allow this phenomenon. By (3.3.2) , the magnitude of a vector difference or sum is larger than the magnitudes of the original vectors.
Figure 3.3.1: Nonorthogonal vectors can cause cancellation when subtracted, but orthogonal vectors never do.
3.3.1 Orthogonal and ONC matrices ¶ Statements about orthogonal vectors are often made more easily in matrix form. Let Q \mathbf{Q} Q be an n × k n\times k n × k matrix whose columns q 1 , … , q k \mathbf{q}_1, \ldots, \mathbf{q}_k q 1 , … , q k are orthogonal vectors. The orthogonality conditions (3.3.1) become simply that Q T Q \mathbf{Q}^T\mathbf{Q} Q T Q is a diagonal matrix, since
Q T Q = [ q 1 T q 2 T ⋮ q k T ] [ q 1 q 2 ⋯ q k ] = [ q 1 T q 1 q 1 T q 2 ⋯ q 1 T q k q 2 T q 1 q 2 T q 2 ⋯ q 2 T q k ⋮ ⋮ ⋮ q k T q 1 q k T q 2 ⋯ q k T q k ] . \mathbf{Q}^T \mathbf{Q} =
\begin{bmatrix}
\mathbf{q}_1^T \\[1mm] \mathbf{q}_2^T \\ \vdots \\ \mathbf{q}_k^T
\end{bmatrix}
\begin{bmatrix}
\mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_k
\end{bmatrix}
=
\begin{bmatrix}
\mathbf{q}_1^T\mathbf{q}_1 & \mathbf{q}_1^T\mathbf{q}_2 & \cdots & \mathbf{q}_1^T\mathbf{q}_k \\[1mm]
\mathbf{q}_2^T\mathbf{q}_1 & \mathbf{q}_2^T\mathbf{q}_2 & \cdots & \mathbf{q}_2^T\mathbf{q}_k \\
\vdots & \vdots & & \vdots \\
\mathbf{q}_k^T\mathbf{q}_1 & \mathbf{q}_k^T\mathbf{q}_2 & \cdots & \mathbf{q}_k^T\mathbf{q}_k
\end{bmatrix}. Q T Q = ⎣ ⎡ q 1 T q 2 T ⋮ q k T ⎦ ⎤ [ q 1 q 2 ⋯ q k ] = ⎣ ⎡ q 1 T q 1 q 2 T q 1 ⋮ q k T q 1 q 1 T q 2 q 2 T q 2 ⋮ q k T q 2 ⋯ ⋯ ⋯ q 1 T q k q 2 T q k ⋮ q k T q k ⎦ ⎤ . If the columns of Q \mathbf{Q} Q are orthonormal, then Q T Q \mathbf{Q}^T\mathbf{Q} Q T Q is the k × k k\times k k × k identity matrix. This is such an important property that we will break with common practice here and give this type of matrix a name.
Of particular interest is a square ONC matrix.[1]
Orthogonal matrices have properties beyond Theorem 3.3.1 .
Since Q \mathbf{Q} Q is an ONC matrix, Q T Q = I \mathbf{Q}^T\mathbf{Q}=\mathbf{I} Q T Q = I . All three matrices are n × n n\times n n × n , so Q − 1 = Q T \mathbf{Q}^{-1}=\mathbf{Q}^T Q − 1 = Q T . The proofs of the other statements are left to the exercises.
3.3.2 Orthogonal factorization ¶ Now we come to another important way to factor a matrix: the QR factorization . As we will show below, the QR factorization plays a role in linear least squares analogous to the role of LU factorization in linear systems.
In most introductory books on linear algebra, the QR factorization is derived through a process known as Gram–Schmidt orthogonalization . However, while it is an important tool for theoretical work, the Gram–Schmidt process is numerically unstable. We will consider an alternative construction in Computing QR factorizations .
When m m m is much larger than n n n , which is often the case, there is a compressed form of the factorization that is more efficient. In the product
A = [ q 1 q 2 ⋯ q m ] [ r 11 r 12 ⋯ r 1 n 0 r 22 ⋯ r 2 n ⋮ ⋱ ⋮ 0 0 ⋯ r n n 0 0 ⋯ 0 ⋮ ⋮ ⋮ 0 0 ⋯ 0 ] , \mathbf{A} =
\begin{bmatrix}
\mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_m
\end{bmatrix}
\begin{bmatrix}
r_{11} & r_{12} & \cdots & r_{1n} \\
0 & r_{22} & \cdots & r_{2n} \\
\vdots & & \ddots & \vdots\\
0 & 0 & \cdots & r_{nn} \\
0 & 0 & \cdots & 0 \\
\vdots & \vdots & & \vdots \\
0 & 0 & \cdots & 0
\end{bmatrix}, A = [ q 1 q 2 ⋯ q m ] ⎣ ⎡ r 11 0 ⋮ 0 0 ⋮ 0 r 12 r 22 0 0 ⋮ 0 ⋯ ⋯ ⋱ ⋯ ⋯ ⋯ r 1 n r 2 n ⋮ r nn 0 ⋮ 0 ⎦ ⎤ , the vectors q n + 1 , … , q m \mathbf{q}_{n+1},\ldots,\mathbf{q}_m q n + 1 , … , q m always get multiplied by zero. Nothing about A \mathbf{A} A is lost if we delete them and reduce the factorization to the equivalent form
A = [ q 1 q 2 ⋯ q n ] [ r 11 r 12 ⋯ r 1 n 0 r 22 ⋯ r 2 n ⋮ ⋱ ⋮ 0 0 ⋯ r n n ] = Q ^ R ^ . \mathbf{A} =
\begin{bmatrix}
\mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_n
\end{bmatrix}
\begin{bmatrix}
r_{11} & r_{12} & \cdots & r_{1n} \\
0 & r_{22} & \cdots & r_{2n} \\
\vdots & & \ddots & \vdots\\
0 & 0 & \cdots & r_{nn}
\end{bmatrix} = \hat{\mathbf{Q}} \hat{\mathbf{R}}. A = [ q 1 q 2 ⋯ q n ] ⎣ ⎡ r 11 0 ⋮ 0 r 12 r 22 0 ⋯ ⋯ ⋱ ⋯ r 1 n r 2 n ⋮ r nn ⎦ ⎤ = Q ^ R ^ . Definition 3.3.4 (Thin QR factorization)
The thin QR factorization is A = Q ^ R ^ \mathbf{A} = \hat{\mathbf{Q}} \hat{\mathbf{R}} A = Q ^ R ^ , where Q ^ \hat{\mathbf{Q}} Q ^ is m × n m\times n m × n and ONC , and R ^ \hat{\mathbf{R}} R ^ is n × n n\times n n × n and upper triangular.
Example 3.3.1 (QR factorization)
Example 3.3.1 Julia provides access to both the thin and full forms of the QR factorization.
A = rand(1.:9., 6, 4)
@show m,n = size(A);
(m, n) = size(A) = (6, 4)
Here is a standard call:
6×6 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}
4×4 Matrix{Float64}:
-17.0587 -12.838 -11.9001 -12.1346
0.0 8.89863 2.16065 0.36146
0.0 0.0 4.44071 -2.5183
0.0 0.0 0.0 -2.29784
If you look carefully, you see that we seemingly got a full Q \mathbf{Q} Q but a thin R \mathbf{R} R . However, the Q \mathbf{Q} Q above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.
Tip
To enter the accented character Q̂
, type Q\hat
followed by Tab .
6×4 Matrix{Float64}:
-0.468968 0.334814 -0.0684959 -0.442051
-0.351726 0.0544507 0.382097 0.576839
-0.527589 -0.311643 -0.361428 -0.348326
-0.117242 0.617494 0.51132 -0.279296
-0.527589 -0.42402 0.368817 0.139263
-0.293105 0.476154 -0.56675 0.503106
We can test that Q \mathbf{Q} Q is an orthogonal matrix:
The thin Q ^ \hat{\mathbf{Q}} Q ^ cannot be an orthogonal matrix, because it is not square, but it is still ONC :
4×4 Matrix{Float64}:
-1.11022e-16 9.51871e-17 -2.86165e-17 5.50694e-17
9.51871e-17 0.0 -2.55659e-17 3.04621e-17
-2.86165e-17 -2.55659e-17 0.0 1.13916e-16
5.50694e-17 3.04621e-17 1.13916e-16 2.22045e-16
Example 3.3.1 MATLAB provides access to both the thin and full forms of the QR factorization.
A = magic(5);
A = A(:, 1:4);
[m, n] = size(A)
Here is the full form:
[Q, R] = qr(A);
szQ = size(Q), szR = size(R)
We can test that Q \mathbf{Q} Q is an orthogonal matrix:
QTQ = Q' * Q
norm(QTQ - eye(m))
With a second input argument given to qr
, the thin form is returned. (This is usually the one we want in practice.)
[Q_hat, R_hat] = qr(A, 0);
szQ_hat = size(Q_hat), szR_hat = size(R_hat)
Now Q ^ \hat{\mathbf{Q}} Q ^ cannot be an orthogonal matrix, because it is not square, but it is still ONC . Mathematically, Q ^ T Q ^ \hat{\mathbf{Q}}^T \hat{\mathbf{Q}} Q ^ T Q ^ is a 4 × 4 4\times 4 4 × 4 identity matrix.
Example 3.3.1 MATLAB provides access to both the thin and full forms of the QR factorization.
A = 1.0 + floor(9 * random.rand(6,4))
A.shape
Here is the full form:
from numpy.linalg import qr
Q, R = qr(A, "complete")
print(f"size of Q is {Q.shape}")
print("R:")
print(R)
size of Q is (6, 6)
R:
[[-14.76482306 -10.83656738 -12.39432395 -13.88435196]
[ 0. -10.12762595 -5.20241107 -4.89169769]
[ 0. 0. -7.63646862 -5.03691218]
[ 0. 0. 0. -3.99068666]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
We can test that Q \mathbf{Q} Q is an orthogonal matrix:
print(f"norm of (Q^T Q - I) is {norm(Q.T @ Q - eye(6)):.3e}")
norm of (Q^T Q - I) is 7.294e-16
The default for qr
, and the one you usually want, is the thin form.
Q_hat, R_hat = qr(A)
print(f"size of Q_hat is {Q_hat.shape}")
print("R_hat:")
print(R_hat)
size of Q_hat is (6, 4)
R_hat:
[[-14.76482306 -10.83656738 -12.39432395 -13.88435196]
[ 0. -10.12762595 -5.20241107 -4.89169769]
[ 0. 0. -7.63646862 -5.03691218]
[ 0. 0. 0. -3.99068666]]
Now Q ^ \hat{\mathbf{Q}} Q ^ cannot be an orthogonal matrix, because it is not square, but it is still ONC . Mathematically, Q ^ T Q ^ \hat{\mathbf{Q}}^T \hat{\mathbf{Q}} Q ^ T Q ^ is a 4 × 4 4\times 4 4 × 4 identity matrix.
print(f"norm of (Q_hat^T Q_hat - I) is {norm(Q_hat.T @ Q_hat - eye(4)):.3e}")
norm of (Q_hat^T Q_hat - I) is 6.135e-16
3.3.3 Least squares and QR ¶ If we substitute the thin factorization (3.3.6) into the normal equations (3.2.3) , we can simplify expressions a great deal.
A T A x = A T b , R ^ T Q ^ T Q ^ R ^ x = R ^ T Q ^ T b , R ^ T R ^ x = R ^ T Q ^ T b . \begin{split}
\mathbf{A}^T\mathbf{A} \mathbf{x} &= \mathbf{A}^T \mathbf{b}, \\
\hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \hat{\mathbf{Q}} \hat{\mathbf{R}} \mathbf{x} &= \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \mathbf{b}, \\
\hat{\mathbf{R}}^T \hat{\mathbf{R}} \mathbf{x}& = \hat{\mathbf{R}}^T \hat{\mathbf{Q}}^T \mathbf{b}.
\end{split} A T Ax R ^ T Q ^ T Q ^ R ^ x R ^ T R ^ x = A T b , = R ^ T Q ^ T b , = R ^ T Q ^ T b . In order to have the normal equations be well posed, we require that A \mathbf{A} A is not rank-deficient (as proved in Theorem 3.2.2 ). This is enough to guarantee that R ^ \hat{\mathbf{R}} R ^ is nonsingular (see Exercise 3.3.4 ). Therefore, its transpose is nonsingular as well, and we arrive at
R ^ x = Q ^ T b . \hat{\mathbf{R}} \mathbf{x}=\hat{\mathbf{Q}}^T \mathbf{b}. R ^ x = Q ^ T b . Algorithm 3.3.1 (Solution of linear least squares by thin QR)
Compute the thin QR factorization Q ^ R ^ = A \hat{\mathbf{Q}}\hat{\mathbf{R}}=\mathbf{A} Q ^ R ^ = A . Compute z = Q ^ T b \mathbf{z} = \hat{\mathbf{Q}}^T\mathbf{b} z = Q ^ T b . Solve the n × n n\times n n × n linear system R ^ x = z \hat{\mathbf{R}}\mathbf{x} = \mathbf{z} R ^ x = z for x \mathbf{x} x . This algorithm is implemented in Function 3.3.2 . It is essentially the algorithm used internally by Julia when A\b
is called. The execution time is dominated by the factorization, the most common method for which is described in Computing QR factorizations .
Solution of least squares by QR factorization1
2
3
4
5
6
7
8
9
10
11
12
"""
lsqrfact(A, b)
Solve a linear least-squares problem by QR factorization. Returns
the minimizer of ||b-Ax||.
"""
function lsqrfact(A, b)
Q, R = qr(A)
c = Q' * b
x = backsub(R, c)
return x
end
Solution of least squares by QR factorization1
2
3
4
5
6
7
8
9
10
11
12
function x = lsqrfact(A, b)
% LSQRFACT Solve linear least squares by QR factorization.
% Input:
% A coefficient matrix (m by n, m > n)
% b right-hand side (m by 1)
% Output:
% x minimizer of || b - Ax ||
[Q, R] = qr(A, 0); % thin factorization
c = Q' * b;
x = backsub(R, c);
end
Solution of least squares by QR factorization1
2
3
4
5
6
7
8
9
10
11
def lsqrfact(A, b):
"""
lsqrfact(A, b)
Solve a linear least squares problem by QR factorization. Returns the
minimizer of ||b-Ax||.
"""
Q, R = np.linalg.qr(A)
c = Q.T @ b
x = backsub(R, c)
return x
The solution of least-squares problems via QR factorization is more stable than when the normal equations are formulated and solved directly.
Example 3.3.2 (Stability of least-squares via QR)
Example 3.3.2 We’ll repeat the experiment of Example 3.2.1 , which exposed instability in the normal equations.
t = range(0, 3, 400)
f = [ x -> sin(x)^2, x -> cos((1 + 1e-7) * x)^2, x -> 1. ]
A = [ f(t) for t in t, f in f ]
x = [1., 2, 1]
b = A * x;
The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.
observed_error = norm(FNC.lsqrfact(A, b) - x) / norm(x);
@show observed_error;
@show error_bound = cond(A) * eps();
observed_error = 4.665273501889628e-9
error_bound = cond(A) * eps() = 4.053030228488391e-9
Example 3.3.2 We’ll repeat the experiment of Example 3.2.1 , which exposed instability in the normal equations.
t = linspace(0, 3, 400)';
A = [ sin(t).^2, cos((1+1e-7)*t).^2, t.^0 ];
x = [1; 2; 1];
b = A * x;
The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.
observed_error = norm(lsqrfact(A, b) - x) / norm(x)
error_bound = cond(A) * eps
Unrecognized function or variable 'lsqrfact'.
Example 3.3.2 We’ll repeat the experiment of Example 3.2.1 , which exposed instability in the normal equations.
t = linspace(0, 3, 400)
A = array([ [sin(t)**2, cos((1+1e-7)*t)**2, 1] for t in t ])
x = array([1, 2, 1])
b = A @ x
The error in the solution by Function 3.3.2 is similar to the bound predicted by the condition number.
print(f"observed error: {norm(FNC.lsqrfact(A, b) - x) / norm(x):.3e}")
print(f"conditioning bound: {cond(A) * finfo(float).eps:.3e}")
observed error: 5.186e-09
conditioning bound: 4.053e-09
3.3.4 Exercises ¶ ⌨ Let t 0 , … , t m t_0,\ldots,t_m t 0 , … , t m be m + 1 m+1 m + 1 equally spaced points in [ − 1 , 1 ] [-1,1] [ − 1 , 1 ] . Let A \mathbf{A} A be the matrix in (3.1.2) for m = 400 m=400 m = 400 and fitting by polynomials of degree less than 5. Find the thin QR factorization of A \mathbf{A} A , and, on a single graph, plot every column of Q ^ \hat{\mathbf{Q}} Q ^ as a function of the vector t t t .
✍ Prove that if the m × n m\times n m × n (m ≥ n m\ge n m ≥ n ) matrix A \mathbf{A} A is not rank-deficient, then the factor R ^ \hat{\mathbf{R}} R ^ of the thin QR factorization is invertible. (Hint: Suppose on the contrary that R ^ \hat{\mathbf{R}} R ^ is singular. Show using the factored form of A \mathbf{A} A that this would imply that A \mathbf{A} A is rank-deficient.)
✍ The matrix P = Q ^ Q ^ T \mathbf{P}=\hat{\mathbf{Q}} \hat{\mathbf{Q}}^T P = Q ^ Q ^ T derived from the thin QR factorization has some interesting and important properties.
(a) Prove that P = A A + \mathbf{P}=\mathbf{A}\mathbf{A}^+ P = A A + .
(b) Prove that P 2 = P \mathbf{P}^2=\mathbf{P} P 2 = P . (This property defines a projection matrix .)
(c) Any vector x \mathbf{x} x may be written trivially as x = u + v \mathbf{x}=\mathbf{u}+\mathbf{v} x = u + v , where u = P x \mathbf{u}=\mathbf{P}\mathbf{x} u = Px and v = ( I − P ) x \mathbf{v}=(\mathbf{I}-\mathbf{P})\mathbf{x} v = ( I − P ) x . Prove that u \mathbf{u} u and v \mathbf{v} v are orthogonal. (Together with part (b), this means that P \mathbf{P} P is an orthogonal projector .)
Confusingly, a square matrix whose columns are orthogonal is not necessarily an orthogonal matrix; the columns must be orthonormal, which is a stricter condition.