Introduction to Eigenvalue Problem⚓︎
Overall Task
: Given
We need to find \(\lambda\) and \(\boldsymbol{x}\).
What are the methods to compute eigenvalues and eigenvectors?
Traditional Methods⚓︎
Traditional Methods Introduction⚓︎
Traditional Method:
- Step 1: Calculate the character polynomial \(p_{\boldsymbol{A}}\left( z \right) =\det \left( z\boldsymbol{I}-\boldsymbol{A} \right)\);
- Step 2: Find the roots of \(p_{\boldsymbol{A}}\left( z \right)\): \(\lambda _i\). They are eigenvalues;
- Step 3: Solve \(\boldsymbol{Ax}=\lambda _i\boldsymbol{x}\) for \(\boldsymbol{x}\). \(\boldsymbol{x}\) is the eigenvalue corresponding to \(\lambda _i\).
Problem of Traditional Methods⚓︎
Example
: Let
Then \(\lambda _{1,2}=1\), \(p_{\boldsymbol{A}}\left( z \right) =\left( z-1 \right) ^2=z^2-2z+1\). The coefficients encodes in the computer are \(\left[ \begin{matrix} 1& -2& 1\\ \end{matrix} \right]\). Assume that the input is \(\left[ \begin{matrix} 1& -2& 1-10^{-16}\\ \end{matrix} \right]\). Then the polynomial becomes \(z^2-2z+\left( 1-10^{-16} \right) =\left( z-\left( 1-10^{-8} \right) \right) \cdot \left( z-\left( 1+10^{-8} \right) \right) =0\). We get \(\lambda _1=1-10^{-8}, \lambda _2=1+10^{-8}\). Therefore, when the input error is \(10^{-16}\), the output error is \(O(10^{-8})\). The root finding procedure amplifies the error!
Root finding is not stable, and algorithms using root finding procedures are not a stable algorithms.
Root Finding V.S. Eigenvalue Problems⚓︎
Comparison⚓︎
Actually, root finding in computers is implemented as eigenvalue solving algorithms.
Claim
: Eigenvalue problem is equivalent to root finding for polynomials.
Given \(p\left( z \right) =z^m+a_{m-1}z^{m-1}+\cdots +a_1z+a_0\). Introduce a matrix:
\(\boldsymbol{A}\in \mathbb{R} ^{m\times m}\) is called the companion matrix for \(p(z)\). We can prove that:
Then getting the roots for the original polynomial is euqivalent to getting the eigenvalues for the corresponding companion matrix.
Difficulty⚓︎
There is fundamental difficulty in the eigenvalue computation: no explicit formula for a general matrix \(\boldsymbol{A}_{m\times m}\) when \(m\geqslant 5\).
Theorem
: For any \(m\geqslant 5\), there is a polynomial \(p(z)\) of degree \(m\) with rational coefficients that has a real root \(p(r)=0\) with the property that \(r\) can NOT be written using any expression involving rational number additions, subtractions, multiplications, divisions or \(k\) -th roots.
All the methods for eigenvalues must be iterative!
Even if working with exact arithmetic, there could be no computing program that would produce the exact roots of an arbitrary polynomial of degree \(m\geqslant 5\) in a finite number of steps. All eigenvalue methods must be iterative.
Numerical Methods⚓︎
We want to convert the original matrix to a matrix that is easy to recognize the eigenvalues (upper/lower triangular or diagonal), and the transformation should not change the eigenvalues of matrices (this is the concept of similar).
Commonly used numerical methods for eigenvalue problem have two phases:
Phase One⚓︎
Phase 1: A direct method to produce an upper Hessenberg matrix \(\boldsymbol{H}\) (an upper Hessenberg matrix has zero entries below the first subdiagonal, and a lower Hessenberg matrix has zero entries above the first superdiagonal):
The procedure can end within finite number of steps. This is done by similar transforms. In linear algebra, \(\boldsymbol{A}\) and \(\boldsymbol{B}\) are similar ( \(\boldsymbol{A}\sim \boldsymbol{B}\) ), if there exists \(\boldsymbol{X}\in \mathbb{R} ^{m\times m}\) invertible such that \(\boldsymbol{XAX}^{-1}=\boldsymbol{B}\). We can let:
Because \(\det \left( \boldsymbol{X} \right) \ne 0, \det \left( \boldsymbol{X}^{-1} \right) \ne 0\), we know that the characteristic polynomials of \(\boldsymbol{A}\) and \(\boldsymbol{B}\) are the same, and \(\boldsymbol{A}\) and \(\boldsymbol{B}\) have the same set of eigenvalues.
Phase Two⚓︎
Phase 2: An iterative procedure that is also based on similarity transforms to produce a formally infinite sequence of upper Hessenberg matrices that converges to a triangular form.
Why we divide the algorithms into two phases? The cost concern is the reason for the two-phase strategy. Phase 1 has \(O(m^3)\) flops, while in Phase 2, it normally converges to \(O\left( \varepsilon _{machine} \right)\) within \(O(m)\) steps, each step requiring at most \(O(m^2)\) steps to finish.
Building blocks: QR Factorization.