Skip to article frontmatterSkip to article content

Power iteration

Given that matrix-vector multiplication is fast for sparse matrices, let’s see what we might accomplish with only that at our disposal.

There was a little cheating in Example 8.2.1 to make the story come out neatly (specifically, the normalization step after creating a random matrix). But it illustrates an important general fact that we investigate now.

8.2.1Dominant eigenvalue

Analysis of matrix powers is most straightforward in the diagonalizable case. Let A\mathbf{A} be any diagonalizable n×nn\times n matrix having eigenvalues λ1,,λn\lambda_1,\ldots,\lambda_n and corresponding linearly independent eigenvectors v1,,vn\mathbf{v}_1,\ldots,\mathbf{v}_n. For our later convenience (and without losing any generality), we assume that the eigenvectors are normalized so that

vj2=1for j=1,,n.\twonorm{\mathbf{v}_j} = 1 \quad \text{for } j=1,\ldots,n.

We also make an important assumption about the eigenvalue magnitudes that will hold for most but not all matrices.

In Example 8.2.1, for instance, λ1=1\lambda_1=1 is the dominant eigenvalue.

Now let x\mathbf{x} be an nn-vector, let kk be a positive integer, and refer to (7.2.11):

Akx=VDkV1x.\mathbf{A}^k \mathbf{x} = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}\mathbf{x}.

Let z=V1x\mathbf{z}=\mathbf{V}^{-1}\mathbf{x}, and recall that D\mathbf{D} is a diagonal matrix of eigenvalues. Then

Akx=VDkz=V[λ1kz1λ2kz2λnkzn]=λ1k[z1v1+z2(λ2λ1)kv2++zn(λnλ1)kvn].\begin{split} \mathbf{A}^k\mathbf{x} &= \mathbf{V}\mathbf{D}^k \mathbf{z} = \mathbf{V}\begin{bmatrix} \lambda_1^kz_1 \\[0.5ex] \lambda_2^kz_2 \\ \vdots \\ \lambda_n^kz_n \end{bmatrix} \\ &= \lambda_1^k \left[ z_1 \mathbf{v}_{1} + z_2 \left(\frac{\lambda_2}{\lambda_1}\right) ^k \mathbf{v}_{2} + \cdots + z_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \mathbf{v}_{n} \right]. \end{split}

Since λ1\lambda_1 is dominant, we conclude that if z10z_1\neq 0,

Akxz1λ1kv12z2z1c2λ2λ1kr2kv22++znz1cnλnλ1krnkvn2=j=2ncjrjk10 as k,\begin{split} \twonorm{ \frac{ \mathbf{A}^k\mathbf{x}}{z_1 \lambda_1^k} - \mathbf{v}_1 } & \le \underbrace{\left|\frac{z_2}{z_1}\right|}_{c_2} \cdot \underbrace{ \left| \frac{\lambda_2}{\lambda_1} \right|^k}_{r_2^k} \twonorm{\mathbf{v}_{2}} + \cdots + \underbrace{\left|\frac{z_n}{z_1}\right|}_{c_n} \cdot \underbrace{\left|\frac{\lambda_n}{\lambda_1}\right|^k}_{r_n^k} \twonorm{\mathbf{v}_{n}} \\[1mm] & = \sum_{j=2}^n c_j r_j^k \cdot 1 \\ & \rightarrow 0 \text{ as $k\rightarrow \infty$}, \end{split}

since, by (8.2.2), rj=λj/λ1<1r_j = | \lambda_j / \lambda_1 | < 1 for j=2,,nj=2,\ldots,n. If we choose x\mathbf{x} randomly, then the probability that z1=0z_1=0 is zero, so we will not be concerned with that case.

8.2.2Algorithm

An important technicality separates us from an algorithm: unless λ1=1|\lambda_1|=1, the factor λ1k\lambda_1^k tends to make Akx\|\mathbf{A}^k\mathbf{x}\| either very large or very small. In practice, we cannot easily normalize by λ1k\lambda_1^k as we did in (8.2.5), since we don’t know λ1\lambda_1 in advance.

This issue is resolved by alternating matrix–vector multiplications with renormalizations.

Observe that we can write

xk=(α1α2αk)Akx1,\mathbf{x}_{k} = (\alpha_1 \alpha_2 \cdots \alpha_k ) \mathbf{A}^k \mathbf{x}_{1},

where αj=yj21\alpha_j = \twonorm{\mathbf{y}_j}^{-1} is the normalization factor at iteration jj. Thus, the reasoning of (8.2.5) implies that xk\mathbf{x}_k converges to a scalar multiple of the dominant eigenvector v1\mathbf{v}_1 of A\mathbf{A}. Specifically, suppose that there is some number γ with γ=1|\gamma|=1 such that

xkγv1=ϵw, \mathbf{x}_k - \gamma \mathbf{v}_1 = \epsilon \mathbf{w},

where w=1\norm{\mathbf{w}} = 1 and ϵ1\epsilon \ll 1. Then

βk=xkyk=xkAxk=(γv1+ϵw)A(γv1+ϵw)=γ2v1(λ1v1)+O(ϵ)=λ1+O(ϵ).\begin{split} \beta_k &= \mathbf{x}_k^* \mathbf{y}_k \\ &= \mathbf{x}_k^* \mathbf{A} \mathbf{x}_k \\ &= \bigl(\overline{\gamma} \mathbf{v}_1^* + \epsilon \mathbf{w}^*\bigr) \mathbf{A} \bigl(\gamma \mathbf{v}_1 + \epsilon \mathbf{w}\bigr) \\ & = |\gamma|^2 \mathbf{v}_1^* (\lambda_1 \mathbf{v}_1) + O(\epsilon) \\ & = \lambda_1 + O(\epsilon). \end{split}

That is, βk\beta_k from the power iteration estimates λ1\lambda_1 about as well as xk\mathbf{x}_k estimates the dominant eigenvector.

Function 8.2.1 is our implementation of power iteration.

8.2.3Convergence rate

While we now feel confident that the sequence {βk}\{\beta_k\} converges to the dominant eigenvalue λ1\lambda_1, we would like to know how fast this convergence is. Looking back at (8.2.4), we know that our normalizations make it so that

xkγv1=b2(λ2λ1)kv2++bn(λnλ1)kvn\mathbf{x}_{k} - \gamma \mathbf{v}_1 = b_2 \left(\frac{\lambda_2}{\lambda_1}\right) ^k \mathbf{v}_{2} + \cdots + b_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \mathbf{v}_{n}

for some constants b2,,bnb_2,\ldots,b_n. If we now make a stronger assumption that λ2\lambda_2 dominates the rest of the eigenvalues, i.e.,

λ1>λ2>λ3λn,|\lambda_1| > |\lambda_2| > |\lambda_3| \ge \cdots \ge |\lambda_n|,

then expression on the right-hand side of (8.2.10) is dominated by its first term, because

j=2nbj(λjλ1)kvj=(λ2λ1)k[b2v2+j=3nbj(λjλ2)kvj0 as k].\sum_{j=2}^n b_j \left(\frac{\lambda_j}{\lambda_1}\right) ^k \mathbf{v}_{j} = \left(\frac{\lambda_2}{\lambda_1}\right)^k \left[ b_2 \mathbf{v}_2 + \underbrace{\sum_{j=3}^n {b_j} \left(\frac{\lambda_j}{\lambda_2}\right) ^k \mathbf{v}_{j} }_{\to 0\, \text{ as } k\to\infty} \right] .

Therefore, (8.2.8) now implies that βkλ1|\beta_k - \lambda_1| is dominated by λ2/λ1k|\lambda_2 / \lambda_1|^k, which is a case of linear convergence.

The practical utility of (8.2.13) is limited: if we knew λ1\lambda_1 and λ2\lambda_2, we wouldn’t be running the power iteration in the first place! Sometimes it’s possible to find estimates of or bounds on the ratio. If nothing else, though, it is useful to know that linear convergence is expected at a rate based solely on the dominant eigenvalues.

8.2.4Exercises