It is hard to think of a problem more ubiquitous than the diagonalization of a matrix.I will discuss today a statistical approximation method for finding the eigenvalues of a symmetric matrix.
Matrix diagonalization is fundamental to quantum mechanics as well as to most other branch of physics and mathematics. As result considerable effort has been expended on obtaining efficient numerical approximations to the eigenvaues of a matrix. In general, for a matrix of size this takes
operations, although in principle
should suffice.If the matrix is sparse-i.e., has very few non-zero elements-much more efficient methods are available.
I have not seen todays method anywhere before, but it was suggested by a paper on the Principal Component Analysis of large rectangular matrices:
Dimitris Achlioptas and Frank McSherry.Fast Computation of Low Rank Matrix Approximations. STOC 2001, pages 611-618.
If anyone knows of this method being used before, please leave a comment or let me know by email: I am new to this subject. I only talk of the basic idea here. If the method works in practice numerically I will post here those results as well. My undergraduate student Jason Robin is checking it out on some examples. Also, I have decided not to spend time giving rigorous proofs until it is clear that the method (i) is new and (ii) works in practice.
Sum Over Paths
Let be an
symmetric matrix; the spectrum of
is (unordered, allowing for multiplicities) the multi-set of its eigenvalues
;i.e.,roots of the characteristic polynomial
The same information arises as the poles of the function (the Stieljes transform of )
Another convenient way of packaging this information is as the spectral density
If the eigenvalues remain bounded as $n$ becomes large, this can approach a continuous probability density on the real line. Clearly
Conversely, the spectral density is determined by the poles ( or more generally the discontinuity across the branch cut) of .
Yet another way to package the information about the eigenvalues of a matrix are the moments of the spectral density:
This has a graphical interpretation. Think of a complete unoriented graph with vertices labelled by ; i.e., every unordered pair
is an edge. Then the last expression is the sum over all closed paths of length
, the contribution of each edge being a factor equal to the matrix element corresponding to it. Or, we can take a `grand canonical’ version: sum over all paths with a weight
for each edge. This gives the Stieljes transform of the spectral density which is also the generating function of the moments:
Sampling the Matrix
Thus finding the moments of a matrix is the same as averaging over all closed paths in the graph of a matrix. We can now think of estimating this average by taking a sample of paths, a common strategy in averaging over large sets. We must sample paths at random such that the probability of picking a path depends only on its length. This is the discrete version of a well known idea in quantum mechanics, the Feynman path integral; also related is the Wiener integral over Brownian paths. The matrices are replaced there by the kernel of the Schr\”odinger or heat equation respectively. In addition to its conceptual elegance, this point of view had led to practically useful approximation methods. The Monte-Carlo simulation methods of lattice gauge theory are sampling methods applied to path integrals.
Even for finite matrices, we might get useful approximation methods.Suppose we take the original matrix and replace each of its entries independently by a random variable which is zero with probability
and equal to
with probability
. Thus
. More generally
if each edge is distinct from the others. If remains fixed as
, most paths will be non-intersecting in this sense. If we average over many such randomizations, the answer for the moments should therefore be unchanged.
The number of nonzero entries in any sample will be about . If
is held fixed as
this is a sparse matrix that can be diagonalized in about
steps. This can be repeated a large number
times and the average spectral density determined. This should be faster, than the direct diagonalization of the matrix.
Actually often it is the largest (or smallest) eigenvalue that is of interest. In this case we can get a sample of values for these variables: finding the largest eigenvalue of a sparse matrix is quite fast by the Lanczos method. Tracy-Widom theory gives us the a priori probability distribution of these random variables and we can use a statistical inference based on it to get a good estimate of the largest eigenvalue of the original matrix.
In fact we could go further. We choose an edge at random and replace the entry there with such that the average is
; this can reduce further the memory required to store
.But this saving is a constant factor independent of the size of the matrix, so it may not be worth
it.