clear all
format short
set(0, 'defaultaxesfontsize', 12)
set(0, 'defaultlinelinewidth', 1.5)
set(0, 'defaultFunctionLinelinewidth', 1.5)
set(0, 'defaultscattermarkerfacecolor', 'flat')
gcf;
set(gcf, 'Position', [0 0 600 350], 'Theme', 'light')
addpath ../FNC_matlab

Executing ...

7.3.Singular value decomposition¶

We now introduce another factorization that is as fundamental as the EVD.

Definition 7.3.1 (Singular value decomposition (SVD))

The singular value decomposition of an $m\times n$ matrix $\mathbf{A}$ is

\mathbf{A} = \mathbf{U} \mathbf{S} \mathbf{V}^*,

(7.3.1)

where $\mathbf{U}\in\mathbb{C}^{m\times m}$ and $\mathbf{V}\in\mathbb{C}^{n\times n}$ are unitary and $\mathbf{S}\in\real^{m\times n}$ is real and diagonal with nonnegative elements.

The columns of $\mathbf{U}$ and $\mathbf{V}$ are called left and right singular vectors, respectively. The diagonal elements of $\mathbf{S}$ , written $\sigma_1,\ldots,\sigma_r$ , for $r=\min\{m,n\}$ , are called the singular values of $\mathbf{A}$ and are ordered so that

\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r\ge 0, \qquad r=\min\{m,n\}.

(7.3.2)

We call $\sigma_1$ the principal singular value and $\mathbf{u}_{1}$ and $\mathbf{v}_{1}$ the principal singular vectors.

7.3.1Connections to the EVD¶

Proof

Let $\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*$ be $m\times n$ . Because $\mathbf{S}$ is real, $\mathbf{S}^* = \mathbf{S}^T$ . We express the square hermitian matrix $\mathbf{B}=\mathbf{A}^*\mathbf{A}$ as

\mathbf{B} = (\mathbf{V}\mathbf{S}^*\mathbf{U}^*) (\mathbf{U}\mathbf{S}\mathbf{V}^*) = \mathbf{V}\mathbf{S}^*\mathbf{S}\mathbf{V}^* = \mathbf{V}(\mathbf{S}^T\mathbf{S})\mathbf{V}^{-1},

(7.3.4)

where we have used the fact that $\mathbf{V}$ is unitary. Note that $\mathbf{S}^T\mathbf{S}$ is a diagonal $n \times n$ matrix. There are two cases to consider. If $m \ge n$ , then

\mathbf{S}^T\mathbf{S} = \begin{bmatrix} \sigma_1^2 & & \\ & \ddots & \\ & & \sigma_n^2 \end{bmatrix}.

(7.3.5)

On the other hand, if $m<n$ , then

\mathbf{S}^T\mathbf{S} = \begin{bmatrix} \sigma_1^2 & & & \\ & \ddots & & \\ & & \sigma_m^2 & \\ & & & \boldsymbol{0} \end{bmatrix}.

(7.3.6)

Another close connection between EVD and SVD comes via the $(m+n)\times (m+n)$ matrix

\mathbf{C} = \begin{bmatrix} 0 & \mathbf{A}^* \\ \mathbf{A} & 0 \end{bmatrix}.

(7.3.7)

If $\sigma$ is a singular value of $\mathbf{A}$ , then $\sigma$ and $-\sigma$ are eigenvalues of $\mathbf{C}$ , and the associated eigenvector immediately reveals a left and a right singular vector (see Exercise 7.3.11). This connection is implicitly exploited by software to compute the SVD.

7.3.2Interpreting the SVD¶

Another way to write $\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*$ is

\mathbf{A}\mathbf{V}=\mathbf{U}\mathbf{S}.

(7.3.8)

Taken columnwise, this equation means

\mathbf{A} \mathbf{v}_{k} = \sigma_k \mathbf{u}_{k}, \qquad k=1,\ldots,r=\min\{m,n\}.

(7.3.9)

In words, each right singular vector is mapped by $\mathbf{A}$ to a scaled version of its corresponding left singular vector; the magnitude of scaling is its singular value.

Both the SVD and the EVD describe a matrix in terms of some special vectors and a few scalars. Table 7.3.1 summarizes the key differences. The SVD sacrifices having the same basis in both source and image spaces—after all, they may not even have the same dimension—but as a result gains orthogonality in both spaces.

Table 7.3.1:Comparison of the EVD and SVD

EVD	SVD
exists for most square matrices	exists for all rectangular and square matrices
$\mathbf{A}\mathbf{x}_k = \lambda_k \mathbf{x}_k$	$\mathbf{A} \mathbf{v}_k = \sigma_k \mathbf{u}_k$
same basis for domain and range of $\mathbf{A}$	two orthonormal bases
may have poor conditioning	perfectly conditioned

7.3.3Thin form¶

In The QR factorization we saw that a matrix has both full and thin forms of the QR factorization. A similar situation holds with the SVD.

Suppose $\mathbf{A}$ is $m\times n$ with $m > n$ and let $\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*$ be an SVD. The last $m-n$ rows of $\mathbf{S}$ are all zero due to the fact that $\mathbf{S}$ is diagonal. Hence

\begin{align*} \mathbf{U} \mathbf{S} & = \begin{bmatrix} \mathbf{u}_1 & \cdots & \mathbf{u}_n & \mathbf{u}_{n+1} & \cdots & \mathbf{u}_m \end{bmatrix} \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \\ & & \\ & \boldsymbol{0} & \\ & & \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{u}_1 & \cdots & \mathbf{u}_n \end{bmatrix} \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \end{bmatrix} = \hat{\mathbf{U}} \hat{\mathbf{S}}, \end{align*}

in which $\hat{\mathbf{U}}$ is $m\times n$ and $\hat{\mathbf{S}}$ is $n\times n$ . This allows us to define the thin SVD

\mathbf{A}=\hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^*,

(7.3.10)

in which $\hat{\mathbf{S}}$ is square and diagonal and $\hat{\mathbf{U}}$ is ONC but not square.

So, in sketch form, a full SVD of a matrix that is taller than it is wide looks like

\rule{1cm}{2.4cm} \; \raisebox{11mm}{=} \; \rule{2.4cm}{2.4cm} \; \raisebox{11mm}{$\centerdot$} \; \rule{1cm}{2.4cm}\; \raisebox{11mm}{$\centerdot$} \; \rule[6mm]{1cm}{1cm}\quad

(7.3.11)

while a thin SVD looks like

\rule{1cm}{2.4cm} \; \raisebox{11mm}{=} \; \rule{1cm}{2.4cm} \; \raisebox{11mm}{$\centerdot$} \; \rule[6mm]{1cm}{1cm} \; \raisebox{11mm}{$\centerdot$} \; \rule[6mm]{1cm}{1cm}

(7.3.12)

The thin form retains all the information about $\mathbf{A}$ from the SVD; the factorization is still an equality, not an approximation. It is computationally preferable when $m \gg n$ , since it requires far less storage than a full SVD. For a matrix with more columns than rows, one can derive a thin form by taking the adjoint of the thin SVD of $\mathbf{A}^*$ .

7.3.4SVD and the 2-norm¶

The SVD is intimately connected to the 2-norm, as the following theorem describes.

The conclusion (7.3.14) can be proved by vector calculus. In the square case $m=n$ , $\mathbf{A}$ having full rank is identical to being invertible. The SVD is the usual means for computing the 2-norm and condition number of a matrix.

Example 7.3.4 (SVD properties)

We verify some of the fundamental SVD properties using the built-in svd function.

A = vander(1:5);
A = A(:, 1:4)

[U, S, V] = svd(A);
disp(sprintf("U is %d by %d. S is %d by %d. V is %d by %d.\n", size(U), size(S), size(V)))

U is 5 by 5. S is 5 by 4. V is 4 by 4.

We verify the orthogonality of the singular vectors as follows:

norm(U' * U - eye(5))
norm(V' * V - eye(4))

Here is verification of the connections between the singular values, norm, and condition number.

s = diag(S);
norm_A = norm(A)
sigma_max = s(1)

cond_A = cond(A)
sigma_ratio = s(1) / s(end)

7.3.5Exercises¶

Exercise 7.3.1

✍ Each factorization below is algebraically correct. The notation $\mathbf{I}_n$ means an $n\times n$ identity. In each case, determine whether it is an SVD. If it is, write down $\sigma_1$ , $\mathbf{u}_1$ , and $\mathbf{v}_1$ . If it is not, state all of the ways in which it fails the required properties.

(a) $\begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}$

(b) $\begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} = \mathbf{I}_2 \begin{bmatrix} 0 & 0 \\ 0 & -1 \end{bmatrix} \mathbf{I}_2$

(c) $\begin{bmatrix} 1 & 0\\ 0 & \sqrt{2}\\ 1 & 0 \end{bmatrix} = \begin{bmatrix} \alpha & 0 & -\alpha \\ 0 & 1 & 0 \\ \alpha & 0 & -\alpha \end{bmatrix} \begin{bmatrix} \sqrt{2} & 0 \\ 0 & \sqrt{2} \\ 0 & 0 \end{bmatrix} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}, \quad \alpha=1/\sqrt{2}$

(d) $\begin{bmatrix} \sqrt{2} & \sqrt{2}\\ -1 & 1\\ 0 & 0 \end{bmatrix} = \mathbf{I}_3 \begin{bmatrix} 2 & 0 \\ 0 & \sqrt{2} \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \alpha & \alpha \\ -\alpha & \alpha \end{bmatrix}, \quad \alpha=1/\sqrt{2}$