Power iteration - Fundamentals of Numerical Computation

Given that matrix-vector multiplication is fast for sparse matrices, let’s see what we might accomplish with only that at our disposal.

Example 8.2.1 (Power iteration)

Julia

MATLAB

Python

Example 8.2.1

Here we choose a random 5×5 matrix and a random 5-vector.

A = rand(1.0:9.0, 5, 5)
A = A ./ sum(A, dims=1)
x = randn(5)

5-element Vector{Float64}:
  1.1084524050022038
 -0.7362969593079214
 -0.03736940032291108
 -0.6853598247844574
 -1.012020838800802

Applying matrix-vector multiplication once doesn’t do anything recognizable.

y = A * x

5-element Vector{Float64}:
 -0.44024455911928195
 -0.653475962571902
  0.0034937781418404846
 -0.1476478283899073
 -0.12472004627463756

Repeating the multiplication still doesn’t do anything obvious.

z = A * y

5-element Vector{Float64}:
 -0.3406327520841068
 -0.2948586166539609
 -0.17808508545901955
 -0.3276104330716936
 -0.22140773094510757

But if we keep repeating the matrix-vector multiplication, something remarkable happens: $\mathbf{A} \mathbf{x} \approx \mathbf{x}$ .

for j in 1:8
    x = A * x
end
[x A * x]

5×2 Matrix{Float64}:
 -0.321916  -0.321916
 -0.308374  -0.308374
 -0.2012    -0.2012
 -0.277991  -0.277991
 -0.253113  -0.253113

This phenomenon seems to occur regardless of the starting vector.

x = randn(5)
for j in 1:8
    x = A * x
end
[x A * x]

5×2 Matrix{Float64}:
 0.636192  0.636192
 0.609428  0.609428
 0.397625  0.397625
 0.549383  0.549383
 0.500218  0.500218

Example 8.2.1

Here we choose a magic 5×5 matrix and a random 5-vector.

A = magic(5) / 65;
x = randn(5, 1);

Applying matrix-vector multiplication once doesn’t do anything recognizable.

y = A * x

Repeating the multiplication still doesn’t do anything obvious.

z = A * y

But if we keep repeating the matrix-vector multiplication, something remarkable happens: $\mathbf{A} \mathbf{x} \approx \mathbf{x}$ .

for j = 1:8
    x = A * x;
end
[x, A * x]

This phenomenon seems to occur regardless of the starting vector.

x = randn(5, 1);
for j = 1:8
    x = A * x;
end
[x, A * x]

Example 8.2.1

Here we choose a random 5×5 matrix and a random 5-vector.

A = random.choice(range(10), (5, 5))
A = A / sum(A, 0)
x = random.randn(5)
print(x)

[-1.92225136 -0.60981542  2.17041897  0.48829961  1.12046354]

Applying matrix-vector multiplication once doesn’t do anything recognizable.

y = A @ x
print(y)

[0.31588531 0.03270245 0.00819637 0.55951479 0.33081642]

Repeating the multiplication still doesn’t do anything obvious.

z = A @ y
print(z)

[0.28662933 0.24704457 0.29384676 0.14292992 0.27666476]

But if we keep repeating the matrix-vector multiplication, something remarkable happens: $\mathbf{A} \mathbf{x} \approx \mathbf{x}$ .

x = random.randn(5)
for j in range(6):
    x = A @ x
print(x)
print(A @ x)

[1.52555468 1.19172909 1.64880601 1.24330277 1.62222169]
[1.52547683 1.19186491 1.64874508 1.24332648 1.62220094]

This phenomenon is unlikely to be a coincidence!

There was a little cheating in Example 8.2.1 to make the story come out neatly (specifically, the normalization step after creating a random matrix). But it illustrates an important general fact that we investigate now.

8.2.1Dominant eigenvalue¶

Analysis of matrix powers is most straightforward in the diagonalizable case. Let $\mathbf{A}$ be any diagonalizable $n\times n$ matrix having eigenvalues $\lambda_1,\ldots,\lambda_n$ and corresponding linearly independent eigenvectors $\mathbf{v}_1,\ldots,\mathbf{v}_n$ . For our later convenience (and without losing any generality), we assume that the eigenvectors are normalized so that

\twonorm{\mathbf{v}_j} = 1 \quad \text{for } j=1,\ldots,n.

(8.2.1)

We also make an important assumption about the eigenvalue magnitudes that will hold for most but not all matrices.

In Example 8.2.1, for instance, $\lambda_1=1$ is the dominant eigenvalue.

Now let $\mathbf{x}$ be an $n$ -vector, let $k$ be a positive integer, and refer to (7.2.11):

\mathbf{A}^k \mathbf{x} = \mathbf{V}\mathbf{D}^k\mathbf{V}^{-1}\mathbf{x}.

(8.2.3)

Let $\mathbf{z}=\mathbf{V}^{-1}\mathbf{x}$ , and recall that $\mathbf{D}$ is a diagonal matrix of eigenvalues. Then

\begin{split} \mathbf{A}^k\mathbf{x} &= \mathbf{V}\mathbf{D}^k \mathbf{z} = \mathbf{V}\begin{bmatrix} \lambda_1^kz_1 \\[0.5ex] \lambda_2^kz_2 \\ \vdots \\ \lambda_n^kz_n \end{bmatrix} \\ &= \lambda_1^k \left[ z_1 \mathbf{v}_{1} + z_2 \left(\frac{\lambda_2}{\lambda_1}\right) ^k \mathbf{v}_{2} + \cdots + z_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \mathbf{v}_{n} \right]. \end{split}

(8.2.4)

Since $\lambda_1$ is dominant, we conclude that if $z_1\neq 0$ ,

\begin{split} \twonorm{ \frac{ \mathbf{A}^k\mathbf{x}}{z_1 \lambda_1^k} - \mathbf{v}_1 } & \le \underbrace{\left|\frac{z_2}{z_1}\right|}_{c_2} \cdot \underbrace{ \left| \frac{\lambda_2}{\lambda_1} \right|^k}_{r_2^k} \twonorm{\mathbf{v}_{2}} + \cdots + \underbrace{\left|\frac{z_n}{z_1}\right|}_{c_n} \cdot \underbrace{\left|\frac{\lambda_n}{\lambda_1}\right|^k}_{r_n^k} \twonorm{\mathbf{v}_{n}} \\[1mm] & = \sum_{j=2}^n c_j r_j^k \cdot 1 \\ & \rightarrow 0 \text{ as $k\rightarrow \infty$}, \end{split}

(8.2.5)

since, by (8.2.2), $r_j = | \lambda_j / \lambda_1 | < 1$ for $j=2,\ldots,n$ . If we choose $\mathbf{x}$ randomly, then the probability that $z_1=0$ is zero, so we will not be concerned with that case.

8.2.2Algorithm¶

An important technicality separates us from an algorithm: unless $|\lambda_1|=1$ , the factor $\lambda_1^k$ tends to make $\|\mathbf{A}^k\mathbf{x}\|$ either very large or very small. In practice, we cannot easily normalize by $\lambda_1^k$ as we did in (8.2.5), since we don’t know $\lambda_1$ in advance.

This issue is resolved by alternating matrix–vector multiplications with renormalizations.

Observe that we can write

\mathbf{x}_{k} = (\alpha_1 \alpha_2 \cdots \alpha_k ) \mathbf{A}^k \mathbf{x}_{1},

(8.2.6)

where $\alpha_j = \twonorm{\mathbf{y}_j}^{-1}$ is the normalization factor at iteration $j$ . Thus, the reasoning of (8.2.5) implies that $\mathbf{x}_k$ converges to a scalar multiple of the dominant eigenvector $\mathbf{v}_1$ of $\mathbf{A}$ . Specifically, suppose that there is some number $\gamma$ with $|\gamma|=1$ such that

\mathbf{x}_k - \gamma \mathbf{v}_1 = \epsilon \mathbf{w},

(8.2.7)

where $\norm{\mathbf{w}} = 1$ and $\epsilon \ll 1$ . Then

\begin{split} \beta_k &= \mathbf{x}_k^* \mathbf{y}_k \\ &= \mathbf{x}_k^* \mathbf{A} \mathbf{x}_k \\ &= \bigl(\overline{\gamma} \mathbf{v}_1^* + \epsilon \mathbf{w}^*\bigr) \mathbf{A} \bigl(\gamma \mathbf{v}_1 + \epsilon \mathbf{w}\bigr) \\ & = |\gamma|^2 \mathbf{v}_1^* (\lambda_1 \mathbf{v}_1) + O(\epsilon) \\ & = \lambda_1 + O(\epsilon). \end{split}

(8.2.8)

That is, $\beta_k$ from the power iteration estimates $\lambda_1$ about as well as $\mathbf{x}_k$ estimates the dominant eigenvector.

Function 8.2.1 is our implementation of power iteration.

Algorithm 8.2.1 (poweriter)

Julia

MATLAB

Python

Power iteration

poweriter.jl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
"""
    poweriter(A, numiter)

Perform `numiter` power iterations with the matrix `A`, starting
from a random vector. Returns a vector of eigenvalue estimates
and the final eigenvector approximation.
"""
function poweriter(A, numiter)
    n = size(A, 1)
    x = normalize(randn(n))
    β = zeros(numiter)
    for k in 1:numiter
        y = A * x
        β[k] = dot(x, y)
        x = y / norm(y)
    end
    return β, x

end

Power iteration

poweriter.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function [beta, x] = poweriter(A, numiter)
% POWERITER   Power iteration for the dominant eigenvalue.
% Input:
%   A         square matrix
%   numiter   number of iterations
% Output: 
%   beta      sequence of eigenvalue approximations (vector)
%   x         final eigenvector approximation

    n = length(A);
    x = randn(n, 1);
    x = x / norm(x);
    beta = zeros(numiter, 1);
    for k = 1:numiter
        y = A * x;
        beta(k) = x' * y;
        x = y / norm(y);
    end
end

Power iteration

poweriter.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def poweriter(A, numiter):
    """
    poweriter(A, numiter)

    Perform numiter power iterations with the matrix A, starting from a random vector, 
    and return a vector of eigenvalue estimates and the final eigenvector approximation.
    """
    n = A.shape[0]
    x = np.random.randn(n)
    x = x / np.linalg.norm(x)
    beta = np.zeros(numiter)
    for k in range(numiter):
        y = A @ x
        beta[k] = np.dot(x, y)
        x = y / np.linalg.norm(y)

    return beta, x

8.2.3Convergence rate¶

While we now feel confident that the sequence $\{\beta_k\}$ converges to the dominant eigenvalue $\lambda_1$ , we would like to know how fast this convergence is. Looking back at (8.2.4), we know that our normalizations make it so that

\mathbf{x}_{k} - \gamma \mathbf{v}_1 = b_2 \left(\frac{\lambda_2}{\lambda_1}\right) ^k \mathbf{v}_{2} + \cdots + b_n \left(\frac{\lambda_n}{\lambda_1}\right)^k \mathbf{v}_{n}

(8.2.10)

for some constants $b_2,\ldots,b_n$ . If we now make a stronger assumption that $\lambda_2$ dominates the rest of the eigenvalues, i.e.,

|\lambda_1| > |\lambda_2| > |\lambda_3| \ge \cdots \ge |\lambda_n|,

(8.2.11)

then expression on the right-hand side of (8.2.10) is dominated by its first term, because

\sum_{j=2}^n b_j \left(\frac{\lambda_j}{\lambda_1}\right) ^k \mathbf{v}_{j} = \left(\frac{\lambda_2}{\lambda_1}\right)^k \left[ b_2 \mathbf{v}_2 + \underbrace{\sum_{j=3}^n {b_j} \left(\frac{\lambda_j}{\lambda_2}\right) ^k \mathbf{v}_{j} }_{\to 0\, \text{ as } k\to\infty} \right] .

(8.2.12)

Therefore, (8.2.8) now implies that $|\beta_k - \lambda_1|$ is dominated by $|\lambda_2 / \lambda_1|^k$ , which is a case of linear convergence.

Example 8.2.2 (Convergence of power iteration)

Julia

MATLAB

Python

Example 8.2.2

We will experiment with the power iteration on a 5×5 matrix with prescribed eigenvalues and dominant eigenvalue at 1.

λ = [1, -0.75, 0.6, -0.4, 0]
# Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diagm(λ)

5×5 Matrix{Float64}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We run the power iteration 60 times. The best estimate of the dominant eigenvalue is the last entry of the first output.

β, x = FNC.poweriter(A, 60)
eigval = β[end]

1.000000023607383

We check for linear convergence using a log-linear plot of the error.

using Plots
err = @. 1 - β
plot(0:59, abs.(err); m=:o, 
    xlabel=L"k",  
    yaxis=(L"|\lambda_1-\beta_k|", :log10, [1e-10, 1]),
    title="Convergence of power iteration")

The asymptotic trend seems to be a straight line, consistent with linear convergence. To estimate the convergence rate, we look at the ratio of two consecutive errors in the linear part of the convergence curve. The ratio of the first two eigenvalues should match the observed rate.

@show theory = λ[2] / λ[1];
@show observed = err[40] / err[39];

theory = λ[2] / λ[1] = -0.75


observed = err[40] / err[39] = -0.7497853183294134

Note that the error is supposed to change sign on each iteration. The effect of these alternating signs is that estimates oscillate around the exact value.

β[26:30]

5-element Vector{Float64}:
 1.0004164679522165
 0.9996858635627827
 1.0002345415522687
 0.9998234628946717
 1.0001320275440824

In practical situations, we don’t know the exact eigenvalue that the algorithm is supposed to find. In that case we would base errors on the final $\beta$ that was found, as in the following plot.

err = @. β[end] - β[1:end-1]
plot(0:58, abs.(err), m=:o, 
    xlabel=L"k", 
    yaxis=(L"|\beta_{60}-\beta_k|", :log10, [1e-10, 1]),
    title="Convergence of power iteration")

The results are very similar until the last few iterations, when the limited accuracy of the reference value begins to show. That is, while it is a good estimate of $\lambda_1$ , it is less good as an estimate of the error in nearby estimates.

Example 8.2.2

We will experiment with the power iteration on a 5×5 matrix with prescribed eigenvalues and dominant eigenvalue at 1.

ev = [1, -0.75, 0.6, -0.4, 0];
% Make a triangular matrix with eigenvalues on the diagonal.
A = triu(ones(5, 5), 1) + diag(ev);

We run the power iteration 60 times. The best estimate of the dominant eigenvalue is the last entry of the first output.

[beta, x] = poweriter(A, 60);
format long
beta(1:12)

We check for linear convergence using a log-linear plot of the error.

err = 1 - beta;
clf,  semilogy(abs(err), '.-')
title('Convergence of power iteration')
xlabel('k'),  ylabel(('|\lambda_1 - \beta_k|'));

theory = ev(2) / ev(1)
observed = err(40) / err(39)

Note that the error is supposed to change sign on each iteration. The effect of these alternating signs is that estimates oscillate around the exact value.

beta(26:29)

In practical situations, we don’t know the exact eigenvalue that the algorithm is supposed to find. In that case we would base errors on the final $\beta$ that was found, as in the following plot.

err = beta(end) - beta(1:end-1);
semilogy(abs(err), '.-')
title('Convergence of power iteration')
xlabel('k'),  ylabel(('|\beta_{60} - \beta_k|'));

Example 8.2.2

We will experiment with the power iteration on a 5×5 matrix with prescribed eigenvalues and dominant eigenvalue at 1.

ev = [1, -0.75, 0.6, -0.4, 0]
A = triu(ones([5, 5]), 1) + diag(ev)    # triangular matrix, eigs on diagonal

We run the power iteration 60 times. The first output should be a sequence of estimates converging to the dominant eigenvalue—which, in this case, we set up to be 1.

beta, x = FNC.poweriter(A, 60)
print(beta)

[36.92587408  2.92855504  1.47630836  1.15233562  1.1061629   1.03938329
  1.03586821  1.01126661  1.01373704  1.00291292  1.00564452  1.00045775
  1.00245599  0.99983789  1.00112699  0.99975826  1.00054297  0.99980985
  1.00027278  0.99987353  1.0001417   0.99992184  1.00007549  0.99995351
  1.00004094  0.99997294  1.00002248  0.99998445  1.00001245  0.99999113
  1.00000693  0.99999497  1.00000387  0.99999716  1.00000217  0.99999839
  1.00000122  0.99999909  1.00000068  0.99999949  1.00000038  0.99999971
  1.00000022  0.99999984  1.00000012  0.99999991  1.00000007  0.99999995
  1.00000004  0.99999997  1.00000002  0.99999998  1.00000001  0.99999999
  1.00000001  0.99999999  1.          1.          1.          1.        ]

We check for linear convergence using a log-linear plot of the error.

err = 1 - beta
semilogy(arange(60), abs(err), "-o")
ylim(1e-10, 1)
xlabel("$k$")
ylabel("$|\\lambda_1 - \\beta_k|$")
title("Convergence of power iteration");

print(f"theory: {ev[1] / ev[0]:.5f}")
print(f"observed: {err[40] / err[39]:.5f}")

theory: -0.75000
observed: -0.75336

Note that the error is supposed to change sign on each iteration. The effect of these alternating signs is that estimates oscillate around the exact value.

print(beta[26:30])

[1.00002248 0.99998445 1.00001245 0.99999113]

In practical situations, we don’t know the exact eigenvalue that the algorithm is supposed to find. In that case we would base errors on the final $\beta$ that was found, as in the following plot.

err = beta[-1] - beta
semilogy(arange(60), abs(err), "-o")
ylim(1e-10, 1), xlabel("$k$")
ylabel("$|\\lambda_1 - \\beta_k|$")
title("Convergence of power iteration");

The practical utility of (8.2.13) is limited: if we knew $\lambda_1$ and $\lambda_2$ , we wouldn’t be running the power iteration in the first place! Sometimes it’s possible to find estimates of or bounds on the ratio. If nothing else, though, it is useful to know that linear convergence is expected at a rate based solely on the dominant eigenvalues.

8.2.4Exercises¶

Exercise 8.2.3

⌨ In Exercise 2.3.5 we considered a mass-lumped model of a hanging string that led to a tridiagonal system of linear equations. Then, in Exercise 7.2.6, we found that eigenvectors of the same matrix correspond to vibrational modes of the string. The same setup can be applied to a membrane hanging from a square frame. Lumping the mass onto a Cartesian grid, each interacts with the four neighbors to the north, south, east, and west. If $n$ masses are used in each coordinate direction, we get an $n^2\times n^2$ sparse matrix $\mathbf{A}$ that can be constructed by FNC.poisson(n).

(a) Let $n=10$ and make a spy plot of $\mathbf{A}$ . What is the density of $\mathbf{A}$ ? Most rows all have the same number of nonzeros; find this number.

(b) Find the dominant $\lambda_1$ using eigs for $n=10,15,20,25$ .

(c) For each $n$ in part (b), apply 100 steps of Function 8.2.1. On one graph, plot the four convergence curves $|\beta_k-\lambda_1|$ using a semi-log scale. (They will not be smooth curves because this matrix has many repeated eigenvalues that complicate the convergence analysis.)

Preface

Sparsity and structure

Preface

Inverse iteration