Vector and matrix norms Tobin A. Driscoll Richard J. Braun
The manipulations on matrices and vectors so far in this chapter have been algebraic, much like those in an introductory linear algebra course. In order to progress to the analysis of the algorithms we have introduced, we need a way to measure the size of vectors and matrices—size in the sense of magnitude or distance, not the number of rows and columns.
2.7.1 Vector norms ¶ For vectors, we use a norm ∥ ⋅ ∥ \| \cdot \| ∥ ⋅ ∥ , which is a function from R n \real^n R n to R \real R with the following properties for all n n n -vectors x , y \mathbf{x},\mathbf{y} x , y and scalars α \alpha α :[1]
∥ x ∥ ≥ 0 , ∥ x ∥ = 0 ⇔ x = 0 , ∥ α x ∥ = ∣ α ∣ ∥ x ∥ , ∥ x + y ∥ ≤ ∥ x ∥ + ∥ y ∥ . \begin{align*}
\norm{\mathbf{x}} &\ge 0, \\
\norm{\mathbf{x}} &=0 \;\Leftrightarrow \; \mathbf{x}=\boldsymbol{0}, \\
\norm{\alpha \mathbf{x}} &= \abs{\alpha}\, \norm{\mathbf{x}}, \\
\norm{\mathbf{x}+\mathbf{y}} & \le \norm{\mathbf{x}} + \norm{\mathbf{y}}.
\end{align*} ∥ x ∥ ∥ x ∥ ∥ α x ∥ ∥ x + y ∥ ≥ 0 , = 0 ⇔ x = 0 , = ∣ α ∣ ∥ x ∥ , ≤ ∥ x ∥ + ∥ y ∥ . The last of these properties is known as the triangle inequality . It is natural to interpret ∥ x ∥ = ∥ x − 0 ∥ \| \mathbf{x} \|=\| \mathbf{x}-\boldsymbol{0} \| ∥ x ∥ = ∥ x − 0 ∥ as the distance from x \mathbf{x} x to the origin and ∥ x − y ∥ \| \mathbf{x}-\mathbf{y} \| ∥ x − y ∥ as the distance from x \mathbf{x} x to y \mathbf{y} y . We will be using only the three most important vector norms, defined as follows.
Definition 2.7.1 (Common vector norms)
2-norm: ∥ x ∥ 2 = ( ∑ i = 1 n ∣ x i ∣ 2 ) 1 2 = x T x \quad \twonorm{\mathbf{x}} = \left( \displaystyle \sum_{i=1}^n |x_i|^2 \right)^{\frac{1}{2}} = \sqrt{\rule[1mm]{0pt}{0.75em}\mathbf{x}^T \mathbf{x}} ∥ x ∥ 2 = ( i = 1 ∑ n ∣ x i ∣ 2 ) 2 1 = x T x
∞ \infty ∞ -norm or max-norm: ∥ x ∥ ∞ = max i = 1 , … , n ∣ x i ∣ \quad \infnorm{\mathbf{x}} = \displaystyle \max_{i=1,\dots,n} |x_i| ∥ x ∥ ∞ = i = 1 , … , n max ∣ x i ∣
1-norm: ∥ x ∥ 1 = ∑ i = 1 n ∣ x i ∣ \quad \onenorm{\mathbf{x}} = \displaystyle \sum_{i=1}^n |x_i| ∥ x ∥ 1 = i = 1 ∑ n ∣ x i ∣
The 2-norm corresponds to ordinary Euclidean distance.
For any nonzero vector v \mathbf{v} v , we can find a unit vector through the normalization x = v / ∥ v ∥ \mathbf{x}=\mathbf{v}/\|\mathbf{v}\| x = v /∥ v ∥ . Thus, we can interpret
v = ∥ v ∥ ⋅ v ∥ v ∥ \mathbf{v} = \| \mathbf{v} \| \,\cdot\, \frac{\mathbf{v}}{\| \mathbf{v} \|} v = ∥ v ∥ ⋅ ∥ v ∥ v as writing a nonzero vector v \mathbf{v} v in magnitude–direction form.
Example 2.7.1 (Vector norms)
Given the vector x = [ 2 , − 3 , 1 , − 1 ] T \mathbf{x}= \bigl[ 2 ,\, -3 ,\, 1 ,\, -1 \bigr]^T x = [ 2 , − 3 , 1 , − 1 ] T , we have
∥ x ∥ 2 = 4 + 9 + 1 + 1 = 15 , ∥ x ∥ ∞ = max { 2 , 3 , 1 , 1 } = 3 , ∥ x ∥ 1 = 2 + 3 + 1 + 1 = 7. \begin{align*}
\| \mathbf{x} \|_2 &= \sqrt{ 4 + 9 + 1 + 1 } = \sqrt{15}, \\[1ex]
\| \mathbf{x} \|_\infty &= \max\{ 2,3,1,1 \} = 3,\\[1ex]
\| \mathbf{x} \|_1 &= 2 + 3 + 1 + 1 = 7.
\end{align*} ∥ x ∥ 2 ∥ x ∥ ∞ ∥ x ∥ 1 = 4 + 9 + 1 + 1 = 15 , = max { 2 , 3 , 1 , 1 } = 3 , = 2 + 3 + 1 + 1 = 7. Example 2.7.1 In Julia the LinearAlgebra
package has a norm
function for vector norms.
x = [2, -3, 1, -1]
twonorm = norm(x) # or norm(x,2)
There is also a normalize
function that divides a vector by its norm, making it a unit vector.
4-element Vector{Float64}:
0.6666666666666666
-1.0
0.3333333333333333
-0.3333333333333333
Example 2.7.1 x = [2; -3; 1; -1];
twonorm = norm(x) % or norm(x, 2)
Example 2.7.1 The norm
function from numpy.linalg
computes vector norms.
from numpy.linalg import norm
x = array([2, -3, 1, -1])
print(norm(x)) # 2-norm by default
Most of the time, when just ∥ x ∥ \| \mathbf{x} \| ∥ x ∥ is written, the 2-norm is implied. However, in this section we use it to mean a generic, unspecified vector norm.
We say that a sequence of vectors x 1 , x 2 , … \mathbf{x}_1,\mathbf{x}_2,\ldots x 1 , x 2 , … converges to x \mathbf{x} x if
lim k → ∞ ∥ x k − x ∥ = 0. \lim_{k\rightarrow\infty} \norm{\mathbf{x}_k - \mathbf{x}} = 0. k → ∞ lim ∥ x k − x ∥ = 0. By definition, a sequence is convergent in the infinity norm if and only if it converges componentwise. The same is true for a convergent sequence in any norm.
2.7.2 Matrix norms ¶ Although we view matrices as two-dimensional, we can also interpret them as vectors: simply stack the columns on top of one another.[2] Hence, we can define a matrix norm via the vector 2-norm.
However, it often proves to be more useful to define matrix norms differently.
The last equality above follows from linearity (as shown in Exercise 2.7.5 ). It is derived from the interpretation of a matrix as a linear operator between R n \real^n R n and R m \real^m R m . Thus in the 2-norm, for instance,
∥ A ∥ 2 = max ∥ x ∥ 2 = 1 ∥ A x ∥ 2 . \| \mathbf{A} \|_2 = \max_{\| \mathbf{x} \|_2=1} \| \mathbf{A}\mathbf{x} \|_2. ∥ A ∥ 2 = ∥ x ∥ 2 = 1 max ∥ Ax ∥ 2 . Example 2.7.2 (Norm of the identity matrix)
In any induced matrix norm, the identity matrix I \mathbf{I} I has norm 1. This is because ∥ I x ∥ = ∥ x ∥ \| \mathbf{I}\mathbf{x} \| = \| \mathbf{x} \| ∥ Ix ∥ = ∥ x ∥ for all vectors x \mathbf{x} x , so the max of ∥ I x ∥ \| \mathbf{I}\mathbf{x} \| ∥ Ix ∥ over unit vectors is 1.
One can interpret the definition of an induced norm geometrically. Each vector x \mathbf{x} x on the unit “sphere” (as defined by the chosen vector norm) is mapped to its image A x \mathbf{A}\mathbf{x} Ax , and the norm of A \mathbf{A} A is the radius of the smallest “sphere” that encloses all such images.
The definition of an induced matrix norm may seem oddly complicated. However, there are some key properties that follow directly from the definition.
Theorem 2.7.2 (Norm inequalities)
Let ∥ ⋅ ∥ \| \cdot \| ∥ ⋅ ∥ designate a matrix norm and the vector norm that induced it. Then for all matrices and vectors of compatible sizes,
∥ A x ∥ ≤ ∥ A ∥ ⋅ ∥ x ∥ . \| \mathbf{A}\mathbf{x} \| \le \| \mathbf{A} \|\cdot \| \mathbf{x} \|. ∥ Ax ∥ ≤ ∥ A ∥ ⋅ ∥ x ∥. For all matrices of compatible sizes,
∥ A B ∥ ≤ ∥ A ∥ ⋅ ∥ B ∥ . \| \mathbf{A}\mathbf{B} \| \le \| \mathbf{A} \|\cdot\| \mathbf{B} \|. ∥ AB ∥ ≤ ∥ A ∥ ⋅ ∥ B ∥. For a square matrix A \mathbf{A} A ,
∥ A k ∥ ≤ ∥ A ∥ k for any integer k ≥ 0 . \| \mathbf{A}^k \| \le \| \mathbf{A} \|^k \text{ for any integer $k\ge 0$.} ∥ A k ∥ ≤ ∥ A ∥ k for any integer k ≥ 0. The first result is trivial if x = 0 \mathbf{x}=\boldsymbol{0} x = 0 ; otherwise,
∥ A x ∥ ∥ x ∥ ≤ max x ≠ 0 ∥ A x ∥ ∥ x ∥ = ∥ A ∥ . \frac{ \| \mathbf{A}\mathbf{x} \| }{\| \mathbf{x} \|} \le
\max_{\mathbf{x}\neq \boldsymbol{0}} \frac{\| \mathbf{A}\mathbf{x} \|}{\| \mathbf{x} \|} = \| \mathbf{A} \|. ∥ x ∥ ∥ Ax ∥ ≤ x = 0 max ∥ x ∥ ∥ Ax ∥ = ∥ A ∥. Inequality (2.7.9) then follows because
∥ A B x ∥ = ∥ A ( B x ) ∥ ≤ ∥ A ∥ ⋅ ∥ B x ∥ ≤ ∥ A ∥ ⋅ ∥ B ∥ ⋅ ∥ x ∥ , \| \mathbf{A}\mathbf{B}\mathbf{x} \| =\| \mathbf{A}(\mathbf{B}\mathbf{x}) \|\le \| \mathbf{A} \|\cdot\| \mathbf{B}\mathbf{x} \| \le
\| \mathbf{A} \|\cdot\| \mathbf{B} \|\cdot\| \mathbf{x} \|, ∥ ABx ∥ = ∥ A ( Bx ) ∥ ≤ ∥ A ∥ ⋅ ∥ Bx ∥ ≤ ∥ A ∥ ⋅ ∥ B ∥ ⋅ ∥ x ∥ , and then
∥ A B ∥ = max x ≠ 0 ∥ A B x ∥ ∥ x ∥ ≤ max x ≠ 0 ∥ A ∥ ⋅ ∥ B ∥ = ∥ A ∥ ⋅ ∥ B ∥ . \| \mathbf{A}\mathbf{B} \| = \max_{\mathbf{x}\neq \boldsymbol{0}} \frac{\| \mathbf{A}\mathbf{B}\mathbf{x} \|}{\| \mathbf{x} \|} \le
\max_{\mathbf{x}\neq \boldsymbol{0}} \| \mathbf{A} \|\cdot\| \mathbf{B} \| = \| \mathbf{A} \|\cdot\| \mathbf{B} \|. ∥ AB ∥ = x = 0 max ∥ x ∥ ∥ ABx ∥ ≤ x = 0 max ∥ A ∥ ⋅ ∥ B ∥ = ∥ A ∥ ⋅ ∥ B ∥. Finally, (2.7.10) results from repeated application of (2.7.9) .
Two of the vector norms we have encountered induce matrix norms that are easy to compute from the matrix elements.
Theorem 2.7.3 (Matrix
∞ \infty ∞ -norm and 1-norm)
∥ A ∥ ∞ = max 1 ≤ i ≤ n ∑ j = 1 n ∣ A i j ∣ , \| \mathbf{A} \|_\infty = \max_{1\le \,i \,\le n} \sum_{j=1}^n |A_{ij}|, ∥ A ∥ ∞ = 1 ≤ i ≤ n max j = 1 ∑ n ∣ A ij ∣ , ∥ A ∥ 1 = max 1 ≤ j ≤ n ∑ i = 1 n ∣ A i j ∣ . \| \mathbf{A} \|_1 = \max_{1\le \,j\, \le n} \sum_{i=1}^n |A_{ij}|. ∥ A ∥ 1 = 1 ≤ j ≤ n max i = 1 ∑ n ∣ A ij ∣. Tip
A mnemonic for (2.7.11) and (2.7.12) is that the ∞ \infty ∞ symbol extends horizontally while the 1 character extends vertically, each indicating the direction of the summation in its formula. Also, both formulas give the same result for m × 1 m\times 1 m × 1 matrices as the vector norm. In both cases you must take absolute values of the matrix elements before summing.
Despite the lack of a simple formula for it, the 2-norm is the default choice for many applications and theorems.
Example 2.7.3 (Matrix norms)
Example 2.7.3 2×2 Matrix{Int64}:
2 0
1 -1
In Julia, one uses norm
for vector norms and for the Frobenius norm of a matrix, which is like stacking the matrix into a single vector before taking the 2-norm.
In Julia, norm(A)
is the Frobenius norm of a matrix, not the induced norm. To compute the induced norm, use opnorm
.
The default for opnorm
is the 2-norm.
You can get the 1-norm as well.
According to (2.7.12) , the matrix 1-norm is equivalent to the maximum of the sums down the columns (in absolute value).
Tip
Use sum
to sum along a dimension of a matrix. You can also sum over the entire matrix by omitting the dims
argument.
The maximum
and minimum
functions also work along one dimension or over an entire matrix. To get both values at once, use extrema
.
# Sum down the rows (1st matrix dimension):
maximum( sum(abs.(A), dims=1) )
Similarly, we can get the ∞ \infty ∞ -norm and check our formula for it.
# Sum across columns (2nd matrix dimension):
maximum( sum(abs.(A), dims=2) )
Next we illustrate a geometric interpretation of the 2-norm. First, we will sample a lot of vectors on the unit circle in R 2 \mathbb{R}^2 R 2 .
Tip
You can use functions as values, e.g., as elements of a vector.
# Construct 601 unit column vectors.
θ = 2π * (0:1/600:1) # type \theta then Tab
x = [ fun(t) for fun in [cos, sin], t in θ ];
To create an array of plots, start with a plot
that has a layout
argument, then do subsequent plot!
calls with a subplot
argument.
plot(aspect_ratio=1, layout=(1, 2),
xlabel=L"x_1", ylabel=L"x_2")
plot!(x[1, :], x[2, :], subplot=1, title="Unit circle")
The linear function f ( x ) = A x \mathbf{f}(\mathbf{x}) = \mathbf{A}\mathbf{x} f ( x ) = Ax defines a mapping from R 2 \mathbb{R}^2 R 2 to R 2 \mathbb{R}^2 R 2 . We can apply A
to every column of x
by using a single matrix multiplication.
The image of the transformed vectors is an ellipse.
plot!(Ax[1, :], Ax[2, :],
subplot=2, title="Image under x → Ax")
That ellipse just touches the circle of radius ∥ A ∥ 2 \|\mathbf{A}\|_2 ∥ A ∥ 2 .
plot!(twonorm*x[1, :], twonorm*x[2, :], subplot=2, l=:dash)
Example 2.7.3 The default matrix norm is the 2-norm.
You can get the 1-norm as well.
According to (2.7.12) , the matrix 1-norm is equivalent to the maximum of the sums down the columns (in absolute value).
Tip
Use sum
to sum along a dimension of a matrix. The max
and min
functions also work along one dimension.
% Sum down the rows (1st matrix dimension):
max( sum(abs(A), 1) )
Similarly, we can get the ∞ \infty ∞ -norm and check our formula for it.
% Sum across columns (2nd matrix dimension):
max( sum(abs(A), 2) )
Next we illustrate a geometric interpretation of the 2-norm. First, we will sample a lot of vectors on the unit circle in R 2 \mathbb{R}^2 R 2 .
Tip
You can use functions as values, e.g., as elements of a vector.
theta = linspace(0, 2*pi, 601);
x = [ cos(theta); sin(theta) ]; % 601 unit column vectors
clf
subplot(1, 2, 1)
plot(x(1, :), x(2, :)), axis equal
title('Unit circle in 2-norm')
xlabel('x_1')
ylabel(('x_2'));
The linear function f ( x ) = A x \mathbf{f}(\mathbf{x}) = \mathbf{A}\mathbf{x} f ( x ) = Ax defines a mapping from R 2 \mathbb{R}^2 R 2 to R 2 \mathbb{R}^2 R 2 . We can apply A
to every column of x
by using a single matrix multiplication.
The image of the transformed vectors is an ellipse that just touches the circle of radius ∥ A ∥ 2 \|\mathbf{A}\|_2 ∥ A ∥ 2 :
subplot(1,2,2), plot(Ax(1,:), Ax(2,:)), axis equal
hold on, plot(twonorm * x(1,:), twonorm * x(2,:), '--')
title('Image of Ax, with ||A||')
xlabel('x_1')
ylabel(('x_2'));
Example 2.7.3 from numpy.linalg import norm
A = array([ [2, 0], [1, -1] ])
The default matrix norm is not the 2-norm. Instead, you must provide the 2 explicitly.
You can get the 1-norm as well.
The 1-norm is equivalent to
print(max( sum(abs(A), axis=0)) ) # sum down the rows
Similarly, we can get the ∞ \infty ∞ -norm and check our formula for it.
print(max( sum(abs(A), axis=1)) ) # sum across columns
Here we illustrate the geometric interpretation of the 2-norm. First, we will sample a lot of vectors on the unit circle in R 2 \mathbb{R}^2 R 2 .
theta = linspace(0, 2*pi, 601)
x = vstack([cos(theta), sin(theta)]) # 601 unit columns
The linear function f ( x ) = A x \mathbf{f}(\mathbf{x}) = \mathbf{A}\mathbf{x} f ( x ) = Ax defines a mapping from R 2 \mathbb{R}^2 R 2 to R 2 \mathbb{R}^2 R 2 . We can apply A
to every column of x
simply by using a matrix multiplication.
We plot the unit circle on the left and the image of all mapped vectors on the right:
subplot(1,2,1)
plot(x[0, :], x[1, :])
axis("equal")
title("Unit circle")
xlabel("$x_1$")
ylabel("$x_2$")
subplot(1,2,2)
plot(y[0, :], y[1, :])
plot(norm(A, 2) * x[0, :], norm(A,2) * x[1, :],"--")
axis("equal")
title("Image under map")
xlabel("$y_1$")
ylabel("$y_2$");
As seen on the right-side plot, the image of the transformed vectors is an ellipse that just touches the circle of radius ∥ A ∥ 2 \|\mathbf{A}\|_2 ∥ A ∥ 2 .
2.7.3 Exercises ¶ ✍ (a) Draw the unit “circle” in the ∞ \infty ∞ -norm, i.e., the set of all vectors x ∈ R 2 \mathbf{x}\in\real^2 x ∈ R 2 such that ∥ x ∥ ∞ = 1 \| \mathbf{x} \|_\infty=1 ∥ x ∥ ∞ = 1 .
(b) Draw the unit “circle” in the 1-norm.
✍ Prove that for all vectors x ∈ R n , \mathbf{x}\in\real^n, x ∈ R n ,
(a) ∥ x ∥ ∞ ≤ ∥ x ∥ 2 ; \| \mathbf{x} \|_\infty \le \| \mathbf{x} \|_2; \qquad ∥ x ∥ ∞ ≤ ∥ x ∥ 2 ;
(b) ∥ x ∥ 2 ≤ ∥ x ∥ 1 \| \mathbf{x} \|_2 \le \| \mathbf{x} \|_1 ∥ x ∥ 2 ≤ ∥ x ∥ 1 .
✍ Prove that for any vectors x \mathbf{x} x , y \mathbf{y} y in R n \real^n R n , ∣ x T y ∣ ≤ ∥ x ∥ 1 ∥ y ∥ ∞ . |\mathbf{x}^T\mathbf{y}| \le \norm{\mathbf{x}}_1\, \norm{\mathbf{y}}_\infty. ∣ x T y ∣ ≤ ∥ x ∥ 1 ∥ y ∥ ∞ .
✍ Prove using Definition 2.7.4 that for any induced matrix norm, matrix A \mathbf{A} A , and scalar c c c , ∥ c A ∥ = ∣ c ∣ ⋅ ∥ A ∥ . \| c\mathbf{A} \| = |c|\cdot \| \mathbf{A} \|. ∥ c A ∥ = ∣ c ∣ ⋅ ∥ A ∥.
✍ Let A = [ − 1 1 2 2 ] . \mathbf{A} =
\displaystyle \begin{bmatrix}
-1 & 1 \\ 2 & 2
\end{bmatrix}. A = [ − 1 2 1 2 ] .
(a) Find all vectors satisfying ∥ x ∥ ∞ = 1 \|\mathbf{x}\|_\infty=1 ∥ x ∥ ∞ = 1 and ∥ A x ∥ ∞ = ∥ A ∥ ∞ \| \mathbf{A}\mathbf{x} \|_\infty=\| \mathbf{A} \|_\infty ∥ Ax ∥ ∞ = ∥ A ∥ ∞ .
(b) Find a vector satisfying ∥ x ∥ 1 = 1 \|\mathbf{x}\|_1=1 ∥ x ∥ 1 = 1 and ∥ A x ∥ 1 = ∥ A ∥ 1 \| \mathbf{A}\mathbf{x} \|_1=\| \mathbf{A} \|_1 ∥ Ax ∥ 1 = ∥ A ∥ 1 .
(c) Find a vector satisfying ∥ x ∥ 2 = 1 \|\mathbf{x}\|_2=1 ∥ x ∥ 2 = 1 such that ∥ A x ∥ 2 = ∥ A ∥ 2 \| \mathbf{A}\mathbf{x} \|_2=\| \mathbf{A} \|_2 ∥ Ax ∥ 2 = ∥ A ∥ 2 . (Hint: Use the definition of ∥ A ∥ 2 \|\mathbf{A}\|_2 ∥ A ∥ 2 as the maximum of ∥ A x ∥ 2 \|\mathbf{A}\mathbf{x}\|_2 ∥ Ax ∥ 2 over unit vectors, and parameterize the unit vectors as ( cos ( θ ) , sin ( θ ) ) (\cos(\theta),\sin(\theta)) ( cos ( θ ) , sin ( θ )) . Then ∥ A x ∥ 2 2 \twonorm{\mathbf{A}\mathbf{x}}^2 ∥ Ax ∥ 2 2 is a function of θ \theta θ that is easy to maximize.)
✍ Prove that for any induced matrix norm and nonsingular matrix A , \mathbf{A}, A , ∥ A − 1 ∥ ≥ ( ∥ A ∥ ) − 1 . \| \mathbf{A}^{-1} \| \ge (\| \mathbf{A} \|)^{-1}. ∥ A − 1 ∥ ≥ ( ∥ A ∥ ) − 1 . (Hint: Apply Theorem 2.7.2 .)
✍ (a) Prove that for any v ∈ R n , \mathbf{v}\in \real^n, v ∈ R n ,
∥ v ∥ p ≥ max i = 1 , … , n ∣ v i ∣ , \| \mathbf{v} \|_p \ge \max_{i=1,\ldots,n} |v_i|, ∥ v ∥ p ≥ i = 1 , … , n max ∣ v i ∣ , where p = 1 , p=1, p = 1 , 2 , 2, 2 , or ∞ . \infty. ∞.
(b) Prove that for any A ∈ R n × n , \mathbf{A}\in\real^{n \times n}, A ∈ R n × n ,
∥ A ∥ p ≥ max i , j = 1 , … , n ∣ A i j ∣ , \| \mathbf{A} \|_p \ge \max_{i,j=1,\ldots,n} |A_{ij}|, ∥ A ∥ p ≥ i , j = 1 , … , n max ∣ A ij ∣ , where p = 1 , p=1, p = 1 , 2 , 2, 2 , or ∞ . \infty. ∞. (Hint: For p = 2 p=2 p = 2 , rearrange (2.7.8) for a well-chosen particular value of x . \mathbf{x}. x . )
✍ Prove using Definition 2.7.4 that if D \mathbf{D} D is a diagonal matrix, then ∥ D ∥ 2 = max i ∣ D i i ∣ . \|\mathbf{D}\|_2 = \max_{i} |D_{ii}|. ∥ D ∥ 2 = max i ∣ D ii ∣. You may assume the matrix is real and square, but that does not affect the result or the proof in any significant way. (Hint: Let M = max i ∣ D i i ∣ . M=\max_{i} |D_{ii}|. M = max i ∣ D ii ∣. Proceed in two stages, showing that ∥ D ∥ 2 ≥ M \|\mathbf{D}\|_2\ge M ∥ D ∥ 2 ≥ M and separately that ∥ D ∥ 2 ≤ M . \|\mathbf{D}\|_2\le M. ∥ D ∥ 2 ≤ M . )
✍ Suppose that A \mathbf{A} A is n × n {n\times n} n × n and that ∥ A ∥ < 1 \| \mathbf{A} \|<1 ∥ A ∥ < 1 in some induced matrix norm.
(a) Show that ( I − A ) (\mathbf{I}-\mathbf{A}) ( I − A ) is nonsingular. (Hint: Use the definition of an induced matrix norm to show that if ( I − A ) x = 0 (\mathbf{I}-\mathbf{A})\mathbf{x}=\boldsymbol{0} ( I − A ) x = 0 for all nonzero x \mathbf{x} x , then ∥ A ∥ ≥ 1 \| \mathbf{A} \|\ge 1 ∥ A ∥ ≥ 1 .)
(b) Show that lim m → ∞ A m = 0 \displaystyle \lim_{m\rightarrow \infty} \mathbf{A}^m = \boldsymbol{0} m → ∞ lim A m = 0 . (For matrices as with vectors, we say B m → L \mathbf{B}_m \rightarrow \mathbf{L} B m → L if ∥ B m − L ∥ → 0 \| \mathbf{B}_m-\mathbf{L} \| \rightarrow 0 ∥ B m − L ∥ → 0 .)
(c) Use (a) and (b) to show that we may obtain the geometric series
( I − A ) − 1 = ∑ k = 0 ∞ A k . (\mathbf{I}-\mathbf{A})^{-1} = \sum_{k=0}^\infty \mathbf{A}^k. ( I − A ) − 1 = k = 0 ∑ ∞ A k . (Hint: Start with ( ∑ k = 0 m A k ) ( I − A ) \left(\displaystyle\sum_{k=0}^m \mathbf{A}^k\right)(\mathbf{I}-\mathbf{A}) ( k = 0 ∑ m A k ) ( I − A ) and take the limit.)
The same statements work for vectors with complex entries, with complex modulus in place of absolute values.
Column stacking is actually how matrices are stored in memory within Julia and is known as column-major order . MATLAB and FORTRAN also use column-major order, while C and Python use row-major order, in which the rows are stacked.