Symmetry and definiteness Tobin A. Driscoll Richard J. Braun
As we saw in Exploiting matrix structure , symmetry can simplify the LU factorization into the symmetric form A = L D L T \mathbf{A}=\mathbf{L}\mathbf{D}\mathbf{L}^T A = LD L T . Important specializations occur as well for the eigenvalue and singular value factorizations. In this section we stay with complex-valued matrices, so we are interested in the case when A ∗ = A \mathbf{A}^*=\mathbf{A} A ∗ = A , i.e., A \mathbf{A} A is hermitian. However, we often loosely speak of symmetry to mean this property even in the complex case. All the statements in this section easily specialize to the real case.
7.4.1 Normality ¶ Suppose now that A ∗ = A \mathbf{A}^*=\mathbf{A} A ∗ = A and that A = U S V ∗ \mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^* A = US V ∗ is an SVD . Since S \mathbf{S} S is real and square, we have
A ∗ = V S ∗ U ∗ = V S U ∗ , \mathbf{A}^* = \mathbf{V} \mathbf{S}^* \mathbf{U}^* = \mathbf{V} \mathbf{S} \mathbf{U}^*, A ∗ = V S ∗ U ∗ = VS U ∗ , and it’s tempting to conclude that U = V \mathbf{U}=\mathbf{V} U = V . Happily, this is nearly true. The following theorem is typically proved in an advanced linear algebra course.
Theorem 7.4.1 (Spectral decomposition)
If A = A ∗ \mathbf{A}=\mathbf{A}^* A = A ∗ , then A \mathbf{A} A has a diagonalization A = V D V − 1 \mathbf{A}=\mathbf{V} \mathbf{D} \mathbf{V}^{-1} A = VD V − 1 in which V \mathbf{V} V is unitary and D \mathbf{D} D is diagonal and real. In other words, A \mathbf{A} A is a normal matrix with real eigenvalues.
Because hermitian matrices are normal, their eigenvalue condition number is guaranteed to be 1 by Theorem 7.2.3 . That fact makes eigenvalues a robust computational target in the hermitian case.
For a hermitian matrix, the EVD
A = V D V − 1 = V D V ∗ \mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1}=\mathbf{V} \mathbf{D} \mathbf{V}^* A = VD V − 1 = VD V ∗ is almost an SVD .
If A ∗ = A \mathbf{A}^*=\mathbf{A} A ∗ = A and A = V D V − 1 \mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1} A = VD V − 1 is a unitary diagonalization, then
A = ( V T ) ⋅ ∣ D ∣ ⋅ V ∗ \mathbf{A} = (\mathbf{V}\mathbf{T})\cdot |\mathbf{D}|\cdot \mathbf{V}^* A = ( VT ) ⋅ ∣ D ∣ ⋅ V ∗ is an SVD , where ∣ D ∣ |\mathbf{D}| ∣ D ∣ is the elementwise absolute value and T \mathbf{T} T is diagonal with ∣ T i i ∣ = 1 |T_{ii}|=1 ∣ T ii ∣ = 1 for all i i i . In particular, the absolute values of the eigenvalues of A \mathbf{A} A are the singular values of A \mathbf{A} A .
7.4.2 Rayleigh quotient ¶ For a hermitian matrix A \mathbf{A} A , the number x ∗ A x \mathbf{x}^* \mathbf{A} \mathbf{x} x ∗ Ax acts much like a scalar quadratic term a x 2 ax^2 a x 2 .
The sizes of the terms ensure that x ∗ A x \mathbf{x}^* \mathbf{A} \mathbf{x} x ∗ Ax is a scalar. Therefore, its complex conjugate is the same as its hermitian, and
x ∗ A x ‾ = ( x ∗ A x ) ∗ = x ∗ A ∗ ( x ∗ ) ∗ = x ∗ A x , \begin{align*}
\overline{\mathbf{x}^* \mathbf{A} \mathbf{x}} & = (\mathbf{x}^* \mathbf{A} \mathbf{x})^* \\
& = \mathbf{x}^* \mathbf{A}^* (\mathbf{x}^*)^* \\
& = \mathbf{x}^* \mathbf{A} \mathbf{x},
\end{align*} x ∗ Ax = ( x ∗ Ax ) ∗ = x ∗ A ∗ ( x ∗ ) ∗ = x ∗ Ax , where the last step uses the given hermitian property. Since any complex number that equals its own conjugate is real, the result follows.
The following facts can be established by straightforward calculations.
Theorem 7.4.4 (Rayleigh quotient)
If A ∗ = A \mathbf{A}^*=\mathbf{A} A ∗ = A , v \mathbf{v} v is an eigenvector of A \mathbf{A} A , and R A R_{\mathbf{A}} R A is the Rayleigh quotient, then:
R A ( v ) = λ , R_{\mathbf{A}}(\mathbf{v})=\lambda, R A ( v ) = λ , the associated eigenvalue, andThe gradient satisfies ∇ R A ( v ) = 0 \nabla R_{\mathbf{A}}(\mathbf{v})=\boldsymbol{0} ∇ R A ( v ) = 0 . As a consequence of Theorem 7.4.4 , the Rayleigh quotient can be used to turn an estimate of the eigenvector v \mathbf{v} v into an estimate of its eigenvalue λ. Specifically,
R A ( v + δ z ) = λ + O ( δ 2 ) , R_{\mathbf{A}}(\mathbf{v}+\delta\mathbf{z}) = \lambda + O(\delta^2), R A ( v + δ z ) = λ + O ( δ 2 ) , as δ → 0 \delta \to 0 δ → 0 .
Example 7.4.1 (Rayleigh quotient)
Example 7.4.1 We will use a symmetric matrix with a known EVD and eigenvalues equal to the integers from 1 to 20.
n = 20;
λ = 1:n
D = diagm(λ)
V, _ = qr(randn(n, n)) # get a random orthogonal V
A = V * D * V';
The Rayleigh quotient is a scalar-valued function of a vector.
R = x -> (x' * A * x) / (x' * x);
The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.
If the input to he Rayleigh quotient is within a small δ of an eigenvector, its output is within O ( δ 2 ) O(\delta^2) O ( δ 2 ) of the corresponding eigenvalue. In this experiment, we observe that each additional digit of accuracy in an approximate eigenvector gives two more digits to the eigenvalue estimate coming from the Rayleigh quotient.
δ = @. 1 ./ 10^(1:5)
eval_diff = zeros(size(δ))
for (k, delta) in enumerate(δ)
e = randn(n)
e = delta * e / norm(e)
x = V[:, 7] + e
eval_diff[k] = R(x) - 7
end
labels = ["perturbation δ", "δ²", "R(x) - λ"]
@pt :header=labels [δ δ .^ 2 eval_diff]
Example 7.4.1 We will use a symmetric matrix with a known EVD and eigenvalues equal to the integers from 1 to 20.
n = 20;
lambda = 1:n;
D = diag(lambda);
[V, ~] = qr(randn(n, n)); % get a random orthogonal V
A = V * D * V';
The Rayleigh quotient is a scalar-valued function of a vector.
R = @(x) (x' * A * x) / (x' * x);
The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.
If the input to he Rayleigh quotient is within a small δ of an eigenvector, its output is within O ( δ 2 ) O(\delta^2) O ( δ 2 ) of the corresponding eigenvalue. In this experiment, we observe that each additional digit of accuracy in an approximate eigenvector gives two more digits to the eigenvalue estimate coming from the Rayleigh quotient.
delta = 1 ./ 10 .^ (1:5)';
dif = zeros(size(delta));
for k = 1:length(delta)
e = randn(n, 1);
e = delta(k) * e / norm(e);
x = V(:, 6) + e;
dif(k) = R(x) - lambda(6);
end
table(delta, dif, variablenames=["perturbation size", "R(x) - lambda"])
Example 7.4.1 We will use a symmetric matrix with a known EVD and eigenvalues equal to the integers from 1 to 20.
from numpy.linalg import qr
n = 20
d = arange(n) + 1
D = diag(d)
V, _ = qr(random.randn(n, n)) # get a random orthogonal V
A = V @ D @ V.T
The Rayleigh quotient is a scalar-valued function of a vector.
R = lambda x: dot(x, A @ x) / dot(x, x)
The Rayleigh quotient evaluated at an eigenvector gives the corresponding eigenvalue.
If the input to he Rayleigh quotient is within a small δ of an eigenvector, its output is within O ( δ 2 ) O(\delta^2) O ( δ 2 ) of the corresponding eigenvalue. In this experiment, we observe that each additional digit of accuracy in an approximate eigenvector gives two more digits to the eigenvalue estimate coming from the Rayleigh quotient.
results = PrettyTable(["perturbation size", "R.Q. - λ"])
for delta in 1 / 10 ** arange(1, 6):
e = random.randn(n)
e = delta * e / norm(e)
x = V[:, 5] + e
quotient = R(x)
results.add_row([delta, quotient - d[5]])
print(results)
+-------------------+------------------------+
| perturbation size | R.Q. - λ |
+-------------------+------------------------+
| 0.1 | 0.0388685275091305 |
| 0.01 | 0.00048415445331340123 |
| 0.001 | 1.7378343324381262e-06 |
| 0.0001 | 6.607612057507595e-08 |
| 1e-05 | 5.722506912775316e-10 |
+-------------------+------------------------+
7.4.3 Definite, semidefinite, and indefinite matrices ¶ In the real case, we called a symmetric matrix A \mathbf{A} A SPD if x T A x > 0 \mathbf{x}^T \mathbf{A}\mathbf{x} > 0 x T Ax > 0 for all nonzero vectors x \mathbf{x} x . There is an analogous definition for complex matrices.
It’s pretty common to use the term positive definite without either the “symmetric” or “hermitian” adjective. This is confusing, because the definiteness property is of little value without the symmetry property, which is generally considered implied.
Putting the HPD property together with the Rayleigh quotient leads to the following.
Suppose item 1 is true. If A x = λ x \mathbf{A}\mathbf{x} = \lambda \mathbf{x} Ax = λ x is an eigenpair, then a Rayleigh quotient implies that
λ = x ∗ A x x ∗ x > 0. \lambda = \frac{ \mathbf{x}^*\mathbf{A}\mathbf{x} }{\mathbf{x}^*\mathbf{x}} > 0. λ = x ∗ x x ∗ Ax > 0. Hence, item 2 is true. Conversely, suppose item 2 is known. Then we can write the EVD as A = V S 2 V ∗ \mathbf{A}=\mathbf{V}\mathbf{S}^2\mathbf{V}^* A = V S 2 V ∗ , where the S i i S_{ii} S ii are positive square roots of the eigenvalues. Hence
x ∗ A x = x ∗ V S 2 V ∗ x = ∥ S V ∗ x ∥ 2 > 0 , \mathbf{x}^*\mathbf{A}\mathbf{x} = \mathbf{x}^*\mathbf{V}\mathbf{S}^2\mathbf{V}^*\mathbf{x} = \|\mathbf{S}\mathbf{V}^*\mathbf{x}\|^2 > 0, x ∗ Ax = x ∗ V S 2 V ∗ x = ∥ S V ∗ x ∥ 2 > 0 , as both S \mathbf{S} S and V \mathbf{V} V are invertible. Thus, item 1 is true.
According to Theorem 7.4.5 , for an HPD matrix, the EVD A = V D V ∗ \mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^* A = VD V ∗ meets all the requirements of the SVD , provided the ordering of eigenvalues is chosen appropriately.
A hermitian matrix with all negative eigenvalues is called negative definite , and one with eigenvalues of different signs is indefinite . Finally, if one or more eigenvalues is zero and the rest have one sign, it is positive or negative semidefinite .
7.4.4 Exercises ¶ ✍ Each line below is an EVD for a hermitian matrix. State whether the matrix is definite, indefinite, or semidefinite. Then state whether the given factorization is also an SVD , and if it is not, modify it to find an SVD .
(a)
[ 0 0 0 − 1 ] = [ 0 1 1 0 ] [ − 1 0 0 0 ] [ 0 1 1 0 ] \begin{bmatrix}
0 & 0 \\ 0 & -1
\end{bmatrix} = \begin{bmatrix}
0 & 1 \\ 1 & 0
\end{bmatrix} \begin{bmatrix}
-1 & 0 \\ 0 & 0
\end{bmatrix} \begin{bmatrix}
0 & 1 \\ 1 & 0
\end{bmatrix} [ 0 0 0 − 1 ] = [ 0 1 1 0 ] [ − 1 0 0 0 ] [ 0 1 1 0 ]
(b)
[ 4 − 2 − 2 1 ] = [ 1 − 0.5 − 0.5 − 1 ] [ 5 0 0 0 ] [ 0.8 − 0.4 − 0.4 − 0.8 ] \begin{bmatrix}
4 & -2 \\ -2 & 1
\end{bmatrix} = \begin{bmatrix}
1 & -0.5 \\ -0.5 & -1
\end{bmatrix} \begin{bmatrix}
5 & 0 \\ 0 & 0
\end{bmatrix} \begin{bmatrix}
0.8 & -0.4 \\ -0.4 & -0.8
\end{bmatrix} [ 4 − 2 − 2 1 ] = [ 1 − 0.5 − 0.5 − 1 ] [ 5 0 0 0 ] [ 0.8 − 0.4 − 0.4 − 0.8 ]
(c)
[ − 5 3 3 − 5 ] = [ α α α − α ] [ − 2 0 0 − 8 ] [ α α α − α ] , α = 1 / 2 \begin{bmatrix}
-5 & 3\\ 3 & -5
\end{bmatrix} = \begin{bmatrix}
\alpha & \alpha \\ \alpha & -\alpha
\end{bmatrix} \begin{bmatrix}
-2 & 0 \\ 0 & -8
\end{bmatrix} \begin{bmatrix}
\alpha & \alpha \\ \alpha & -\alpha
\end{bmatrix}, \quad\alpha=1/\sqrt{2} [ − 5 3 3 − 5 ] = [ α α α − α ] [ − 2 0 0 − 8 ] [ α α α − α ] , α = 1/ 2
⌨ The matrix names below are found in MatrixDepot
for Julia, gallery
for MATLAB, and rogues
for Python. You will have to adjust the syntax accordingly. For each matrix, determine whether it is positive definite, negative definite, positive or negative semidefinite, or indefinite.
(a) pei(5)
− 6 I - 6 \mathbf{I} − 6 I
(b) hilb(8)
− 2 I - 2 \mathbf{I} − 2 I
(c) dingdong(20)
(d) lehmer(100)
(e) fiedler(200)
⌨ The range of the function R A ( x ) R_{\mathbf{A}}(\mathbf{x}) R A ( x ) is a subset of the complex plane known as the field of values of the matrix A \mathbf{A} A .
(a) Use 1000 random real vectors to plot points in the field of values of the matrix
[ 1 0 − 2 0 2 0 − 2 0 1 ] . \displaystyle \begin{bmatrix}
1 & 0 & -2\\
0 & 2 & 0\\
-2 & 0 & 1
\end{bmatrix}. ⎣ ⎡ 1 0 − 2 0 2 0 − 2 0 1 ⎦ ⎤ . You should get 1000 dots lying on the real axis.
(b) Compute the eigenvalues of the matrix. By comparison to the plot from (a), guess what the exact field of values is.
✍ The matrix A = [ 3 − 2 − 2 0 ] \mathbf{A}=\displaystyle \begin{bmatrix} 3 & -2 \\ -2 & 0 \end{bmatrix} A = [ 3 − 2 − 2 0 ] has an eigenvector [ 1 , 2 ] . [1,\, 2]. [ 1 , 2 ] .
(a) Write out R A ( x ) R_{\mathbf{A}}(\mathbf{x}) R A ( x ) explicitly as a function of x 1 x_1 x 1 and x 2 x_2 x 2 .
(b) Find R A ( x ) R_{\mathbf{A}}(\mathbf{x}) R A ( x ) for x 1 = 1 x_1=1 x 1 = 1 , x 2 = 2 x_2=2 x 2 = 2 .
(c) Find the gradient vector ∇ R A ( x ) \nabla R_{\mathbf{A}}(\mathbf{x}) ∇ R A ( x ) .
(d) Show that the gradient vector is zero when x 1 = 1 x_1=1 x 1 = 1 , x 2 = 2 x_2=2 x 2 = 2 .
⌨ Thanks largely to Theorem 7.4.1 , the eigenvalue problem for symmetric/hermitian matrices is easier than for general matrices.
(a) Let A \mathbf{A} A be a 1000 × 1000 1000\times 1000 1000 × 1000 random real matrix, and let S = A + A T \mathbf{S}=\mathbf{A}+\mathbf{A}^T S = A + A T . Time finding the eigenvalues of A \mathbf{A} A and then of S \mathbf{S} S . You should find that the computation for S \mathbf{S} S is around an order of magnitude faster.
(b) Perform the experiment from part (a) on n × n n\times n n × n matrices for n = 200 , 300 , … , 1600 n=200,300,\ldots,1600 n = 200 , 300 , … , 1600 . Plot running time as a function of n n n for both matrices on a single log-log plot. Is the ratio of running times roughly constant, or does it grow with n n n ?