from numpy import *
from scipy import linalg
from scipy.linalg import norm
from matplotlib.pyplot import *
from prettytable import PrettyTable
from timeit import default_timer as timer
import sys
sys.path.append('fncbook/')
import fncbook as FNC
# This (optional) block is for improving the display of plots.
# from IPython.display import set_matplotlib_formats
# set_matplotlib_formats("svg","pdf")
# %config InlineBackend.figure_format = 'svg'
rcParams["figure.figsize"] = [7, 4]
rcParams["lines.linewidth"] = 2
rcParams["lines.markersize"] = 4
rcParams['animation.html'] = "jshtml" # or try "html5"
The SVD has another important property that proves very useful in a variety of applications. Let A be a real m×n matrix with SVD A=USVT and (momentarily) m≥n. Another way of writing the thin form of the SVD is
where r is the rank of A. The final formula also holds for the case m<n.
Each outer product uiviT is a rank-1 matrix of unit 2-norm. Thanks to the ordering of singular values, then, Equation (7.5.1) expresses A as a sum of decreasingly important contributions. This motivates the definition, for 1≤k≤r,
where Uk and Vk are the first k columns of U and V, respectively, and Sk is the upper-left k×k submatrix of S.
The rank of a sum of matrices is always less than or equal to the sum of the ranks, so Ak is a rank-k approximation to A. It turns out that Ak is the best rank-k approximation of A, as measured in the matrix 2-norm.
If the singular values of A decrease sufficiently rapidly, then Ak may capture the most significant behavior of the matrix for a reasonably small value of k.
The use of dimension reduction offered by low-rank SVD approximation goes well beyond simply reducing computation time. By isolating the most important contributions to the matrix, dimension reduction can uncover deep connections and trends that are otherwise obscured by weaker effects and noise.
One useful way to quantify the decay in the singular values is to compute
Clearly 0≤τk≤1 and τk is non-decreasing as a function of k. We can think of τk as the fraction of energy (or in statistical terms, variance) contained in the singular values up to and including the kth.[1]
Not all data sets can be reduced effectively to a small number of dimensions, but as Example 7.5.2 illustrates, in some cases reduction reveals information that corresponds to real-world understanding.