LU factorization - Fundamentals of Numerical Computation

A major tool in numerical linear algebra is to factor a given matrix into terms that are individually easier to deal with than the original. In this section we derive a means to express a square matrix using triangular factors, which will allow us to solve a linear system using forward and backward substitution.

2.4.1Outer products¶

Our derivation of the factorization hinges on an expression of matrix products in terms of vector outer products. If $\mathbf{u}\in\real^m$ and $\mathbf{v}\in\real^n$ , then the outer product of these vectors is the $m\times n$ matrix

\mathbf{u} \mathbf{v}^T = \begin{bmatrix} u_1 v_1 & u_1 v_2 & \cdots & u_1 v_n \\u_2 v_1 & u_2 v_2 & \cdots & u_2 v_n \\ \vdots & \vdots & & \vdots \\ u_m v_1 & u_m v_2 & \cdots & u_m v_n \end{bmatrix}.

(2.4.1)

We illustrate the connection of outer products to matrix multiplication by a small example.

Example 2.4.1

According to the usual definition of matrix multiplication,

\begin{align*} \small \begin{bmatrix} 4 & -1 \\ -3 & 5 \\ -2 & 6 \end{bmatrix} \begin{bmatrix} 2 & -7 \\ -3 & 5 \end{bmatrix} & = \small \begin{bmatrix} (4)(2) + (-1)(-3) & (4)(-7) + (-1)(5) \\ (-3)(2) + (5)(-3) & (-3)(-7) + (5)(5) \\ (-2)(2) + (6)(-3) & (-2)(-7) + (6)(5) \end{bmatrix}. \end{align*}

(2.4.2)

If we break this up into the sum of two matrices, however, each is an outer product.

\begin{align*} & = \small \begin{bmatrix} (4)(2) & (4)(-7) \\ (-3)(2) & (-3)(-7) \\ (-2)(2) & (-2)(-7) \end{bmatrix} + \begin{bmatrix} (-1)(-3) & (-1)(5) \\ (5)(-3) & (5)(5) \\ (6)(-3) & (6)(5) \end{bmatrix}\\[2mm] & = \small \begin{bmatrix} 4 \\ -3 \\ -2 \end{bmatrix} \begin{bmatrix} 2 & -7 \end{bmatrix} \: + \: \begin{bmatrix} -1 \\ 5 \\ 6 \end{bmatrix} \begin{bmatrix} -3 & 5 \end{bmatrix}. \end{align*}

(2.4.3)

Note that the vectors here are columns of the left-hand matrix and rows of the right-hand matrix. The matrix product is defined only if there are equal numbers of these.

It is not hard to derive the following generalization of Example 2.4.1 to all matrix products.

2.4.2Triangular product¶

Equation (2.4.4) has some interesting structure for the product $\mathbf{L}\mathbf{U}$ , where $\mathbf{L}$ is $n\times n$ and lower triangular (i.e., zero above the main diagonal) and $\mathbf{U}$ is $n\times n$ and upper triangular (zero below the diagonal).

Example 2.4.2 (Triangular outer products)

Julia

MATLAB

Python

Example 2.4.2

We explore the outer product formula for two random triangular matrices.

L = tril( rand(1:9, 3, 3) )

3×3 Matrix{Int64}:
 2  0  0
 4  3  0
 7  8  5

U = triu( rand(1:9, 3, 3) )

3×3 Matrix{Int64}:
 9  6  9
 0  4  7
 0  0  2

Here are the three outer products in the sum in (2.4.4):

L[:, 1] * U[1, :]'

3×3 Matrix{Int64}:
 18  12  18
 36  24  36
 63  42  63

L[:, 2] * U[2, :]'

3×3 Matrix{Int64}:
 0   0   0
 0  12  21
 0  32  56

L[:, 3] * U[3, :]'

3×3 Matrix{Int64}:
 0  0   0
 0  0   0
 0  0  10

Simply because of the triangular zero structures, only the first outer product contributes to the first row and first column of the entire product.

Example 2.4.2

We explore the outer product formula for two random triangular matrices.

L = tril( randi(9, 3, 3) )

U = triu( randi(9, 3, 3) )

Here are the three outer products in the sum in (2.4.4):

L(:, 1) * U(1, :)

L(:, 2) * U(2, :)

L(:, 3) * U(3, :)

Simply because of the triangular zero structures, only the first outer product contributes to the first row and first column of the entire product.

Example 2.4.2

We explore the outer product formula for two random triangular matrices.

from numpy.random import randint
L = tril(randint(1, 10, size=(3, 3)))
print(L)

[[8 0 0]
 [9 7 0]
 [7 4 5]]

U = triu(randint(1, 10, size=(3, 3)))
print(U)

[[4 7 2]
 [0 6 6]
 [0 0 7]]

Here are the three outer products appearing in the sum in (2.4.4):

print(outer(L[:, 0], U[0, :]))

[[32 56 16]
 [36 63 18]
 [28 49 14]]

print(outer(L[:, 1], U[1, :]))

[[ 0  0  0]
 [ 0 42 42]
 [ 0 24 24]]

print(outer(L[:, 2], U[2, :]))

[[ 0  0  0]
 [ 0  0  0]
 [ 0  0 35]]

Simply because of the triangular zero structures, only the first outer product contributes to the first row and first column of the entire product.

Let the columns of $\mathbf{L}$ be written as $\boldsymbol{\ell}_k$ and the rows of $\mathbf{U}$ be written as $\mathbf{u}_k^T$ . Then the first row of $\mathbf{L}\mathbf{U}$ is

\mathbf{e}_1^T \sum_{k=1}^n \boldsymbol{ℓ}_k \mathbf{u}_k^T = \sum_{k=1}^n (\mathbf{e}_1^T \boldsymbol{\ell}_k) \mathbf{u}_k^T = L_{11} \mathbf{u}_1^T.

(2.4.5)

Likewise, the first column of $\mathbf{L}\mathbf{U}$ is

\left( \sum_{k=1}^n \mathbf{ℓ}_k \mathbf{u}_k^T\right) \mathbf{e}_1 = \sum_{k=1}^n \mathbf{\ell}_k (\mathbf{u}_k^T \mathbf{e}_1) = U_{11}\boldsymbol{\ell}_1.

(2.4.6)

These two calculations are enough to derive one of the most important algorithms in scientific computing.

2.4.3Triangular factorization¶

Our goal is to factor a given $n\times n$ matrix $\mathbf{A}$ as the triangular product $\mathbf{A}=\mathbf{L}\mathbf{U}$ . It turns out that we have $n^2+n$ total nonzero unknowns in the two triangular matrices, so we choose $n$ of them arbitrarily as follows.

We will require that $\mathbf{L}$ be a unit lower triangular matrix.

Example 2.4.3 (LU factorization)

Julia

MATLAB

Python

Example 2.4.3

For illustration, we work on a $4 \times 4$ matrix. We name it with a subscript in preparation for what comes.

A₁ = [
     2    0    4     3 
    -4    5   -7   -10 
     1   15    2   -4.5
    -2    0    2   -13
    ];
L = diagm(ones(4))
U = zeros(4, 4);

Now we appeal to (2.4.5). Since $L_{11}=1$ , we see that the first row of $\mathbf{U}$ is just the first row of $\mathbf{A}_1$ .

U[1, :] = A₁[1, :]
U

4×4 Matrix{Float64}:
 2.0  0.0  4.0  3.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

From (2.4.6), we see that we can find the first column of $\mathbf{L}$ from the first column of $\mathbf{A}_1$ .

L[:, 1] = A₁[:, 1] / U[1, 1]
L

4×4 Matrix{Float64}:
  1.0  0.0  0.0  0.0
 -2.0  1.0  0.0  0.0
  0.5  0.0  1.0  0.0
 -1.0  0.0  0.0  1.0

We have obtained the first term in the sum (2.4.4) for $\mathbf{L}\mathbf{U}$ , and we subtract it away from $\mathbf{A}_1$ .

A₂ = A₁ - L[:, 1] * U[1, :]'

4×4 Matrix{Float64}:
 0.0   0.0  0.0    0.0
 0.0   5.0  1.0   -4.0
 0.0  15.0  0.0   -6.0
 0.0   0.0  6.0  -10.0

Now $\mathbf{A}_2 = \boldsymbol{\ell}_2\mathbf{u}_2^T + \boldsymbol{\ell}_3\mathbf{u}_3^T + \boldsymbol{\ell}_4\mathbf{u}_4^T.$ If we ignore the first row and first column of the matrices in this equation, then in what remains we are in the same situation as at the start. Specifically, only $\boldsymbol{\ell}_2\mathbf{u}_2^T$ has any effect on the second row and column, so we can deduce them now.

U[2, :] = A₂[2, :]
L[:, 2] = A₂[:, 2] / U[2, 2]
L

4×4 Matrix{Float64}:
  1.0  0.0  0.0  0.0
 -2.0  1.0  0.0  0.0
  0.5  3.0  1.0  0.0
 -1.0  0.0  0.0  1.0

If we subtract off the latest outer product, we have a matrix that is zero in the first two rows and columns.

A₃ = A₂ - L[:, 2] * U[2, :]'

4×4 Matrix{Float64}:
 0.0  0.0   0.0    0.0
 0.0  0.0   0.0    0.0
 0.0  0.0  -3.0    6.0
 0.0  0.0   6.0  -10.0

Now we can deal with the lower right $2\times 2$ submatrix of the remainder in a similar fashion.

U[3, :] = A₃[3, :]
L[:, 3] = A₃[:, 3] / U[3, 3]
A₄ = A₃ - L[:, 3] * U[3, :]'

4×4 Matrix{Float64}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  2.0

Finally, we pick up the last unknown in the factors.

U[4, 4] = A₄[4, 4];

We now have all of $\mathbf{L}$ ,

4×4 Matrix{Float64}:
  1.0  0.0  -0.0  0.0
 -2.0  1.0  -0.0  0.0
  0.5  3.0   1.0  0.0
 -1.0  0.0  -2.0  1.0

and all of $\mathbf{U}$ ,

4×4 Matrix{Float64}:
 2.0  0.0   4.0   3.0
 0.0  5.0   1.0  -4.0
 0.0  0.0  -3.0   6.0
 0.0  0.0   0.0   2.0

We can verify that we have a correct factorization of the original matrix by computing the backward error:

A₁ - L * U

4×4 Matrix{Float64}:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

IIn floating point, we cannot expect the difference to be exactly zero as we found in this toy example. Instead, we would be satisfied to see that each element of the difference above is comparable in size to machine precision.

Example 2.4.3

For illustration, we work on a $4 \times 4$ matrix. We name it with a subscript in preparation for what comes.

A_1 = [
     2    0    4     3 
    -4    5   -7   -10 
     1   15    2   -4.5
    -2    0    2   -13
    ];
L = eye(4);
U = zeros(4, 4);

Now we appeal to (2.4.5). Since $L_{11}=1$ , we see that the first row of $\mathbf{U}$ is just the first row of $\mathbf{A}_1$ .

U(1, :) = A_1(1, :)

From (2.4.6), we see that we can find the first column of $\mathbf{L}$ from the first column of $\mathbf{A}_1$ .

L(:, 1) = A_1(:, 1) / U(1, 1)

We have obtained the first term in the sum (2.4.4) for $\mathbf{L}\mathbf{U}$ , and we subtract it away from $\mathbf{A}_1$ .

A_2 = A_1 - L(:, 1) * U(1, :)

U(2, :) = A_2(2, :)
L(:, 2) = A_2(:, 2) / U(2, 2)

If we subtract off the latest outer product, we have a matrix that is zero in the first two rows and columns.

A_3 = A_2 - L(:, 2) * U(2, :)

Now we can deal with the lower right $2\times 2$ submatrix of the remainder in a similar fashion.

U(3, :) = A_3(3, :);
L(:, 3) = A_3(:, 3) / U(3, 3);
A_4 = A_3 - L(:, 3) * U(3, :)

Finally, we pick up the last unknown in the factors.

U(4, 4) = A_4(4, 4);

We now have all of $\mathbf{L}$ ,

and all of $\mathbf{U}$ ,

We can verify that we have a correct factorization of the original matrix by computing the backward error:

A_1 - L * U

In floating point, we cannot expect the difference to be exactly zero as we found in this toy example. Instead, we would be satisfied to see that each element of the difference above is comparable in size to machine precision.

Example 2.4.3

For illustration, we work on a $4 \times 4$ matrix. We name it with a subscript in preparation for what comes.

A_1 = array([
     [2,    0,    4,    3], 
     [-4,    5,   -7,  -10], 
     [1,   15,    2,   -4.5],
     [-2,    0,    2,  -13]
        ])
L = eye(4)
U = zeros((4, 4));

Now we appeal to (2.4.5). Since $L_{11}=1$ , we see that the first row of $\mathbf{U}$ is just the first row of $\mathbf{A}_1$ .

U[0, :] = A_1[0, :]
print(U)

[[2. 0. 4. 3.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

From (2.4.6), we see that we can find the first column of $\mathbf{L}$ from the first column of $\mathbf{A}_1$ .

L[:, 0] = A_1[:, 0] / U[0, 0]
print(L)

[[ 1.   0.   0.   0. ]
 [-2.   1.   0.   0. ]
 [ 0.5  0.   1.   0. ]
 [-1.   0.   0.   1. ]]

We have obtained the first term in the sum (2.4.4) for $\mathbf{L}\mathbf{U}$ , and we subtract it away from $\mathbf{A}_1$ .

A_2 = A_1 - outer(L[:, 0],  U[0, :])

U[1, :] = A_2[1, :]
L[:, 1] = A_2[:, 1] / U[1, 1]
print(L)

[[ 1.   0.   0.   0. ]
 [-2.   1.   0.   0. ]
 [ 0.5  3.   1.   0. ]
 [-1.   0.   0.   1. ]]

If we subtract off the latest outer product, we have a matrix that is zero in the first two rows and columns.

A_3 = A_2 - outer(L[:, 1], U[1, :])

Now we can deal with the lower right $2\times 2$ submatrix of the remainder in a similar fashion.

U[2, :] = A_3[2, :]
L[:, 2] = A_3[:, 2] / U[2, 2]
A_4 = A_3 - outer(L[:, 2], U[2, :])

Finally, we pick up the last unknown in the factors.

U[3, 3] = A_4[3, 3]

We now have all of $\mathbf{L}$ ,

print(L)

[[ 1.   0.  -0.   0. ]
 [-2.   1.  -0.   0. ]
 [ 0.5  3.   1.   0. ]
 [-1.   0.  -2.   1. ]]

and all of $\mathbf{U}$ ,

print(U)

[[ 2.  0.  4.  3.]
 [ 0.  5.  1. -4.]
 [ 0.  0. -3.  6.]
 [ 0.  0.  0.  2.]]

We can verify that we have a correct factorization of the original matrix by computing the backward error:

A_1 - L @ U

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

We have arrived at the linchpin of solving linear systems.

The outer product algorithm for LU factorization seen in Example 2.4.3 is coded as Function 2.4.1.

Algorithm 2.4.1 (lufact)

Julia

MATLAB

Python

LU factorization (not stable)

lufact.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"""
    lufact(A)

Compute the LU factorization of square matrix `A`, returning the
factors.
"""
function lufact(A)
    n = size(A, 1)        # detect the dimensions from the input
    L = diagm(ones(n))   # ones on main diagonal, zeros elsewhere
    U = zeros(n, n)
    Aₖ = float(copy(A))  # make a working copy

    # Reduction by outer products
    for k in 1:n-1
        U[k, :] = Aₖ[k, :]
        L[:, k] = Aₖ[:, k] / U[k, k]
        Aₖ -= L[:, k] * U[k, :]'
    end
    U[n, n] = Aₖ[n, n]
    return LowerTriangular(L), UpperTriangular(U)
end

LU factorization (not stable)

lufact.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function [L, U] = lufact(A)
% LUFACT   LU factorization (demo only--not stable!).
% Input:
%   A    square matrix
% Output:
%   L, U  unit lower triangular and upper triangular such that LU = A

    n = size(A, 1);     % detect the dimensions from the input
    L = eye(n);         % ones on main diagonal, zeros elsewhere
    U = zeros(n, n);
    A_k = A;            % make a working copy 

    % Reduction by outer products
    for k = 1:n-1
        U(k, :) = A_k(k, :);
        L(:, k) = A_k(:, k) / U(k, k);
        A_k = A_k -  L(:, k) * U(k, :);
    end
    U(n, n) = A_k(n, n);
    L = tril(L);    % enforce exact triangularity
    U = triu(U);
end

LU factorization (not stable)

lufact.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def lufact(A):
    """
    lufact(A)

    Compute the LU factorization of square matrix A, returning the
    factors.
    """
    n = A.shape[0]     # detect the dimensions from the input
    L = np.eye(n)      # ones on main diagonal, zeros elsewhere
    U = np.zeros((n, n))
    A_k = np.copy(A.astype(float))    # make a working copy 

    # Reduction by np.outer products
    for k in range(n-1):
        U[k, :] = A_k[k, :]
        L[:, k] = A_k[:, k] / U[k,k]
        A_k -= np.outer(L[:,k], U[k,:])
    U[n-1, n-1] = A_k[n-1, n-1]
    return L, U

2.4.4Gaussian elimination and linear systems¶

In your first matrix algebra course, you probably learned a triangularization technique called Gaussian elimination or row elimination to solve a linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ . In most presentations, you form an augmented matrix $[\mathbf{A}\;\mathbf{b}]$ and do row operations until the system reaches an upper triangular form, followed by backward substitution. LU factorization is equivalent to Gaussian elimination in which no row swaps are performed, and the elimination procedure produces the factors if you keep track of the row multipliers appropriately.

Like Gaussian elimination, the primary use of LU factorization is to solve a linear system. It reduces a given linear system to two triangular ones. From this, solving $\mathbf{A}\mathbf{x}=\mathbf{b}$ follows immediately from associativity:

\mathbf{b} = \mathbf{A} \mathbf{x} = (\mathbf{L} \mathbf{U}) \mathbf{x} = \mathbf{L} (\mathbf{U} \mathbf{x}).

(2.4.8)

Defining $\mathbf{z} = \mathbf{U} \mathbf{x}$ leads to the following.

A key advantage of the factorization point of view is that it depends only on the matrix $\mathbf{A}$ . If systems are to be solved for a single $\mathbf{A}$ but multiple different versions of $\mathbf{b}$ , then the factorization approach is more efficient, as we’ll see in Efficiency of matrix computations.

Example 2.4.4 (Solving a linear system by LU factors)

Julia

MATLAB

Python

Example 2.4.4

Here are the data for a linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ .

A = [2 0 4 3; -4 5 -7 -10; 1 15 2 -4.5; -2 0 2 -13];
b = [4,9,9,4];

We apply Function 2.4.1 and then do two triangular solves.

L, U = FNC.lufact(A)
z = FNC.forwardsub(L, b)
x = FNC.backsub(U, z)

4-element Vector{Float64}:
 192.66666666666666
 -15.533333333333335
 -65.33333333333333
 -40.0

A check on the residual assures us that we found the solution.

b - A*x

4-element Vector{Float64}:
  0.0
 -5.684341886080802e-14
  2.842170943040401e-14
  0.0

Example 2.4.4

Here are the data for a linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ .

A = [2 0 4 3; -4 5 -7 -10; 1 15 2 -4.5; -2 0 2 -13];
b = [4; 9; 9; 4];

We apply Function 2.4.1 and then do two triangular solves.

[L, U] = lufact(A)
z = forwardsub(L, b);
x = backsub(U, z);

A check on the residual assures us that we found the solution.

b - A * x

Example 2.4.4

Here are the data for a linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ .

A = array([
    [2, 0, 4, 3], 
    [-4, 5, -7, -10], 
    [1, 15, 2, -4.5],
    [-2, 0, 2, -13]
    ])
b = array([4, 9, 9, 4])

We apply Function 2.4.1 and then do two triangular solves.

L, U = FNC.lufact(A)
z = FNC.forwardsub(L, b)
x = FNC.backsub(U, z)

A check on the residual assures us that we found the solution.

b - A @ x

array([ 0.00000000e+00, -5.68434189e-14, 2.84217094e-14, 0.00000000e+00])

As noted in the descriptions of Function 2.4.1 and Algorithm 2.4.2, the LU factorization as we have seen it so far is not stable for all matrices. In fact, it does not always even exist. The missing element is the row swapping allowed in Gaussian elimination. We will address these issues in Row pivoting.

2.4.5Exercises¶

Exercise 2.4.2

⌨ The matrices

\mathbf{T}(x,y) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ x & y & 1 \end{bmatrix},\qquad \mathbf{R}(\theta) = \begin{bmatrix} \cos\theta & \sin \theta & 0 \\ -\sin\theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix}

are used to represent translations and rotations of plane points in computer graphics. For the following, let

\mathbf{A} = \mathbf{T}(3,-1) \mathbf{R}(\pi/5) \mathbf{T}(-3,1), \qquad \mathbf{z} = \begin{bmatrix} 2 \\ 2 \\ 1 \end{bmatrix}.

(a) Find $\mathbf{b} = \mathbf{A}\mathbf{z}$ .

(b) Use Function 2.4.1 to find the LU factorization of $\mathbf{A}$ .

(c) Use the factors with triangular substitutions in order to solve $\mathbf{A}\mathbf{x}=\mathbf{b}$ , and find $\mathbf{x}-\mathbf{z}$ .

Exercise 2.4.3

⌨ Define

\mathbf{A}= \begin{bmatrix} 1 & 0 & 0 & 0 & 10^{12} \\ 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}, \quad \hat{\mathbf{x}} = \begin{bmatrix} 0 \\ 1/3 \\ 2/3 \\ 1 \\ 4/3 \end{bmatrix}, \quad \mathbf{b} = \mathbf{A}\hat{\mathbf{x}}.

(a) Using Function 2.4.1 and triangular substitutions, solve the linear system $\mathbf{A}\mathbf{x}=\mathbf{b}$ , showing the result for $\mathbf{x}$ . Rounding down, how many accurate digits are in the 3rd component of the result? (The answer is much less than the full 16 of double precision.)

(b) Repeat part (a) with 10²⁰ as the element in the upper right corner of the matrix. (The result is even less accurate. We will study the causes of such low accuracy in Conditioning of linear systems.)

Exercise 2.4.6

When computing the determinant of a matrix by hand, it’s common to use cofactor expansion and apply the definition recursively. But this is terribly inefficient as a function of the matrix size.

(a) ✍ Explain using determinant properties why, if $\mathbf{A}=\mathbf{L}\mathbf{U}$ is an LU factorization,

\det(\mathbf{A}) = U_{11}U_{22}\cdots U_{nn}=\prod_{i=1}^n U_{ii}.

(b) ⌨ Using the result of part (a), write a function determinant(A) that computes the determinant using Function 2.4.1. Test your function on at least two nontriangular $5\times 5$ matrices, comparing your result to the result of the standard det function (found in LinearAlgebra for Julia and numpy.linalg for Python). Remember to check relative error, not absolute error, as the determinant can be very large or very small.

Preface

Linear systems

Preface

Efficiency of matrix computations