Arnoldi Method

Basics

  • Method of Arnoldi (1951) was proposed as a means of reducing a matrix to Hessenberg form.
  • Arnoldi suggested that the process could provide good approximations to some eigenvalues if stopped before completion. This idea was fully developed in later work by Saad (1980) and other authors, making the method evolve into one of the most successful ones for non-Hermitian eigenproblem.
  • Many variants of the method have been proposed. In this note, we provide general descriptions of some variants of the method.

Arnoldi’s Idea

  • Given, a square matrix $\mathbf{A}$ of order $n$, if $n$ steps of Arnoldi’s method are carried out then an orthogonal reduction to Hessenberg form is achieved such that $\mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{H}$, where $\mathbf{H}$ is an upper Hessenberg matrix of order $n$ with positive subdiagonal elements and $\mathbf{V}$ is an orthogonal matrix.
  • $\mathbf{V}$ and $\mathbf{H}$ are uniquely determined by the first column of $\mathbf{V}$, a unit-norm vector $v_1 = V e_1$ that is called the initial vector.
  • For some intial vectors, Arnoldi’s method fails after $m$ steps, and in that case the algorithm produces an $n\times m$ matrix $\mathbf{V}_m$ with orthogonal columns and upper-Hessenberg matrix of order $m$, $\mathbf{H}_m$ that satisfy $$\mathbf{A}\mathbf{V}_m - \mathbf{V}_m \mathbf{H}_m = 0$$. This implies that the columns of $\mathbf{V}_m$ span an invariant subspace of matrix $\mathbf{A}$.
  • In the general case in which the algorithm does not break down, after $m$ steps the following relations holds $$ \begin{equation} \mathbf{A}\mathbf{V}_m - \mathbf{V}_m \mathbf{H}_m = f e^{*}_m \label{1} \end{equation} $$ where vector $f$ is usually called the residual of the $m$-step Arnoldi factorization.

Algorithm

Basic Arnoldi

🏷️ Algorithm 1

Input: Matrix A, number of steps m, and initial vector v_1 of norm 1
Output: (V_m, H_m, f, β) so that A V_m - V_m H_m = f e^{*}_m, β = ||f||_2
    For j = 1, 2, ..., m-1
      w = A v_j
      Orthogonalize w with respect to V_j (obtaining h_{1:j, j})
      h_{j+1, j} = ||w||_2
      If h_{j+1, j} = 0, stop
      v_{j+1} = w / h_{j+1, j}
    end
    f = A v_m
    Orthogonalize f with respect to V_m (obtaining h_{1:m, m})
    β = ||f||_2
  • Algorithm 1 builds an orthogonal basis of the Krylov subspace $\mathcal{K}_m (\mathbf{A}, v_1) = \text{span} \{ \mathbf{A}^0 v_1, \mathbf{A}^1 v_1, \mathbf{A}^2 v_1, \ldots, \mathbf{A}^{m-1} v_1 \}$. Note that $\mathbf{A}^0 v_1 = v_1$ because $\mathbf{A}^0 = \mathbf{I}$, an identity matrix.
  • In order to maintain a good level of orthogonality, an iterative Gram-Schmidt procedure must be used for orthogonalization.
  • Computed basis vectors are the columns of $\mathbf{V}_m$, which are called Arnoldi vectors.
  • If some $w$ is zero after orthogonalization then the algorithm breaks downbefore completing $m$ steps. In that case,the residual of the Arnoldi factorization is zero and $\mathcal{K}_j (\mathbf{A}, v_1)$ where $j < m$ is an exact invariant subspace of $\mathbf{A}$. However, this situation is very unlikely in-practice due to finite precision arithmetic.

Multiplying by $\mathbf{V}^{*}_m$ on $\eqref{1}$ i.e. $$ \mathbf{V}^{*}_m \mathbf{A} \mathbf{V}_m - \mathbf{V}^{*}_m \mathbf{V}_m \mathbf{H}_m = \mathbf{V}^{*}_m f e^{*}_m $$ Since, any matrix can be transformed to an upper Hessenberg matrix by similarity transformation that means $\mathbf{V}^{*}_m \mathbf{A}\mathbf{V}_m = \mathbf{H}_m$ that means the matrix $\mathbf{H}_m$ represents the orthogonal projection of $\mathbf{A}$ onto Krylov subspace. Also, $\mathbf{V}_m$ being orthogonal matrix suggests $\mathbf{V}^{*}_m \mathbf{V}_m = \mathbf{I}$, an identity matrix. Thus, $\mathbf{V}^{*}_m f = 0$.

  • $\mathbf{V}^{*}_m \mathbf{A}\mathbf{V}_m = \mathbf{H}_m$ allows us to compute Rayleigh-Ritz approximations of the eigenpairs of $\mathbf{A}$.

Let $(\lambda_i, y_i)$ be an eigenpair of matrix $\mathbf{H_m}$, the Ritz value $\lambda_i$ and the Ritz vector $x_i = \mathbf{V}_m y_i$, can be taken as approximation of an eigenpair of $\mathbf{A}$. Typically, only a small percentage of the $m$ approximations are good. This can be assessed by means of the residual norm for the Ritz pair, which turns out to be very easy to compute: $$ || \mathbf{A} x_i - \lambda_i x_i ||_2 = || \mathbf{A} \mathbf{V}_m y_i - \lambda_i \mathbf{V}_m y_i||_2 = $$ $$ || (\mathbf{A}\mathbf{V}_m - \mathbf{V}_m \mathbf{H}_m) y_i ||_2 = || f ||_2 |e^{*}_m y_i| = \beta |e^{*}_m y_i|. $$

  • In practical situations, the number of steps $m$ required to obtain good approximations maybe too large. The problem with a large $m$ is not only storage but also the computational cost that grows in every step. For this reason, a restarted version of the algorithm is necessary in practice.
  • The general idea of restarting is that after $\mathbf{V}_m$ has been computed, for a fixed value of $m$, a new Arnoldi process is started, trying to benefit from the previously computed information.

We’ll use $ncv$ as a second name for $m$ which denotes the maximum number of vectors allowed for the basis.

Explicit Restart

  • Idea of explicit restart is to iteratively compute different $m$-step Arnoldi factorizations with successively “better” initial vectors.
  • Initial vector for the next Arnoldi run is computed from the information available in the most recent factorization.
  • Simplest way to select the new initial vector is to take the Ritz vector (or Schur vector) associated to the dominant eigenvalue $v_1 = \mathbf{V}_m y_1$.
  • The strategy for restarted method to be effective in computing more than one eigenpair is to keep track of already converged eigenpairs and perform some form of deflation. This is done by a technique usually called locking, in which vectors associated to converged eigenvalues are not modified in successive runs.

Suppose that after certain Arnoldi run, the first $k$ eigenpairs have already converged to the desired accuracy, and write $\mathbf{V}_m$ as $$\mathbf{V}_m = \left[ \mathbf{V} _{1:k} ^{(l)} \mid \mathbf{V} _{k+1:m} ^{(a)} \right]$$ where $(l)$ superscript indicates locked vectors and $(a)$ superscript indicates active vectors.

  • In the next Arnoldi run, only $m - k$ Arnoldi vectors must be computed, the active ones, and in doing this the first $k$ vectors have to be deflated. This can be simply by orthogonalizing every new Arnoldi vector also with respect to the locked ones, as illustrated in Algorithm 2.
🏷️ Algorithm 2

Input: Matrix A, number of steps m, V_{1:k}, H_{1:k} with k< m, and initial vector v_{k+1} of norm 1
Output: (V_m, H_m, f, β) so that A V_m - V_m H_m = f e^{*}_m, β = ||f||_2
    For j = k+1, ..., m-1
      w = A v_j
      Orthogonalize w with respect to V_j (obtaining h_{1:j, j})
      h_{j+1, j} = ||w||_2
      If h_{j+1, j} = 0, stop
      v_{j+1} = w / h_{j+1, j}
    end
    f = A v_m
    Orthogonalize f with respect to V_m (obtaining h_{1:m, m})
    β = ||f||_2
  • Note that Algorithm 2 only modifies the last $(m-k)$ columns of $\mathbf{V_m}$ and $\mathbf{H}_m$, the active part of the factorization
  • All the operations of the algorithm are performed on this active part. These operations are the computation of the Arnoldi factorization with initial vector $v_{k+1}$ and then the computation of the Schur form.

Permalink at https://www.physicslog.com/cs-notes/nm/arnoldi-method

Published on Jan 18, 2023

Last revised on Jan 20, 2023

References