Predicting how long an algorithm will take to solve a particular problem, on a particular computer, as written in a particular way in a particular programming language, is an enormously difficult undertaking. It’s more practical to predict how the required time will scale as a function of the size of the problem. In the case of a linear system of equations, the problem size is n, the number of equations and variables in the system. Because expressions of computational time are necessarily approximate, it’s customary to suppress all but the term that is dominant as n→∞. We first need to build some terminology for these expressions.
One immediate consequence is that f∼g implies f=O(g).[1]
It’s conventional to use asymptotic notation that is as specific as possible. For instance, while it is true that n2+n=O(n10), it’s more informative, and usually expected, to say n2+n=O(n2). There are additional notations that enforce this requirement strictly, but we will just stick to the informal understanding.
There is a memorable way to use asymptotic notation to simplify sums:
k=1∑nkk=1∑nk2k=1∑nkp∼2n2=O(n2), as n→∞,∼3n3=O(n3), as n→∞,⋮∼p+1np+1=O(np+1), as n→∞.
Traditionally, in numerical linear algebra, we count arithmetic operations to measure an algorithm’s running time.
Suppose that the running time t of an algorithm obeys a function that is O(np). For sufficiently large n, t≈Cnp for a constant C should be a good approximation. Hence
Take forwardsub, for instance. It has a single flop in line 10. Line 12 computes
sum(L[i, j] * x[j] for j in 1:i-1)
This line requires i−1 multiplications and (i−2) additions, for a total of 2i−3 flops. Line 13 adds two more flops, to make that 2n−1. These lines are performed within a loop as i ranges from 2 to n, so the total count is
1+i=2∑n(2i−1)=i=1∑n(2i−1)=−n+2i=1∑ni.
Take forwardsub, for instance. All of the flops are in line 12:
(b(i) - L(i, 1:i-1) * x(1:i-1)) / L(i, i);
The first operation to be performed in this line is the inner product L(i, 1:i-1) * x(1:i-1), which requires i−1 multiplications and (i−2) additions, for a total of 2i−3 flops. Then there is one subtraction and one division, which adds two more flops. These lines are performed within a loop as i ranges from 1 to n, so the total count is
i=1∑n(2i−1)=−n+2i=1∑ni.
Take forwardsub, for instance. Line 11 computes
s = L[i, :i] @ x[:i]
This is the inner product between the first i elements of the ith row of L and the first i elements of x, requiring i multiplications and (i−1) additions. Line 212 adds two more flops, for a total of 2i+1 in each pass through the loop. In this loop, i ranges from 0 to n−1, so the total count is
i=0∑n−1(2i+1)=i=1∑n(2i−1)=−n+2i=1∑ni.
It is not hard to find an exact formula for the sum above, but we use the asymptotic expression (2.5.5) to simplify it to ∼n2. After all, since flop counting is only an approximation of true running time, why bother with the more complicated exact expression? An analysis of backward substitution yields the same result.
Before counting flops for the LU factorization, we have to admit that Function 2.4.1 is not written as economically as it could be. Recall from our motivating example in Example 2.4.3 that we zero out the first row and column of A with the first outer product, the second row and column with the second outer product, and so on. There is no good reason to do multiplications and additions with values known to be zero.
Julia
MATLAB
Python
Suppose we replace lines 14–18 of lufact in Algorithm 2.4.1 with
14
15
16
17
18
for k in 1:n-1
U[k, k:n] = Aₖ[k, k:n]
L[k:n, k] = Aₖ[k:n, k] / U[k, k]
Aₖ[k:n, k:n] -= L[k:n, k] * U[k, k:n]'
end
We will use the following handy fact.
Line 16 above divides each element of the vector Aₖ[k:n, k] by a scalar. Hence, the number of flops equals the length of the vector, which is n−k+1.
Line 17 has an outer product followed by a matrix subtraction. The definition (A.5) of the outer product makes it clear that that computation takes one flop (multiplication) per element of the result, which here results in (n−k+1)2 flops. The number of subtractions is identical.
Altogether, the factorization takes
k=1∑n−1n−k+1+2(n−k+1)2
flops.
Suppose we replace lines 14–18 of lufact in Algorithm 2.4.1 with
Line 16 above divides each element of the vector A_k(k:n, k) by a scalar. Hence, the number of flops equals the length of the vector, which is n−k+1.
Line 17 has an outer product followed by a matrix subtraction. The definition (A.5) of the outer product makes it clear that that computation takes one flop (multiplication) per element of the result, which here results in (n−k+1)2 flops. The number of subtractions is identical.
Altogether, the factorization takes
k=1∑n−1n−k+1+2(n−k+1)2
flops.
Suppose we replace lines 14–17 of lufact in Algorithm 2.4.1 with
14
15
16
17
for k in range(n-1):
U[k, k:n] = A_k[k, k:n]
L[k:n, k] = A_k[k:n, k] / U[k, k]
A_k[k:n, k:n] -= np.outer(L[k:n, k], U[k, k:n])
We will use the following handy fact.
Line 16 above divides each element of the vector A_k[k:n, k] / U[k, k] by a scalar. Hence, the number of flops equals the length of the vector, which is n−k.
Line 17 has an outer product followed by a matrix subtraction. The definition (A.5) of the outer product makes it clear that that computation takes one flop (multiplication) per element of the result, which here results in (n−k)2 flops. The number of subtractions is identical.
Altogether, the factorization takes
k=0∑n−2n−k+2(n−k)2=k=1∑n−1n−k+1+2(n−k+1)2
flops.
There are different ways to simplify the total count,
We will make a change of summation index using j=n−k. The endpoints of the sum are j=n−1 when k=1 and j=1 when k=n−1. Since the order of terms in a sum doesn’t matter, we get
In practice, flops are not the only aspect of an implementation that occupies significant time. Our position is that counting flops as a measure of performance is a useful oversimplification. We will assume that LU factorization (and as a result, the solution of a linear system of n equations) requires a real-world time that is roughly O(n3). This growth rate is a great deal more tolerable than, say, O(2n), but it does mean that for (at this writing) n greater than 10,000 or so, something other than general LU factorization will have to be used.