COMPUTING THE SQUARE ROOT OF A MARKOV MATRIX:
EIGENVALUES AND THE
Donald R. Burleson, Ph.D.
Copyright © 2005 by Donald R. Burleson. All rights reserved.
To return to the previous eigenvalues article, click here.
To return to the “About the Author” page, click here.
For a square (nXn) matrix A with eigenvalues λi and corresponding eigenvectors vi, one may use (see “Eigenvalues and Non-Integer Powers of a Square Matrix”) the relation
Axvi = λxvi
with x = ½ to define the square root of the matrix A by specifying that must be a matrix
satisfying the relation
A1/2vi = λ1/2vi
for i= 1, 2, …, n. (To see a detailed discussion of this essentially simple computational procedure, and an important application, click here.)
To
corroborate the reasonableness of this eigenvalue approach,
one may also compute by making use of an
analogy with the standard Taylor series expansion of a function of a real or
complex variable, where the function f(x) is expanded about a center x = a:
.
Taking a = 1 and taking derivatives, one may expand f(x) = x1/2 as the series representation
with radius of convergence ρ = 1.
The analogy, for a function of a square matrix A rather than a real or complex variable x, with a view to computing the square root of A asymptotically, is to employ the identity matrix I in the role of the number 1 (the center of the series) and to treat the powers (A-I)n in terms of matrix multiplication:
under an appropriate convergence condition, which as we shall see can be readily determined, at least in the case of a Markov matrix.
For example, consider the 2X2 Markov state-transition matrix
with
When we compute powers of A-I we find
and in general
where, it is easy to show, for a 2X2 Markov matrix A with eigenvalues λ1 and λ2, the factor (-0.7) in general represents trace(A-I) (the sum of the eigenvalues of A-I), which is λ1 + λ2 – 2. (Analogous things are true for Markov matrices of higher order.)
When one
writes (-0.7)n-1(A-I) for (A-I)n
in the matrix-valued
which, if one follows the coefficient series out for the first fourteen terms (the convergence being rather slow) produces the result
.
This is of course an approximation, which improves upon adding further terms of the series; the result shown is good to within one-ten-thousandth.
It turns out that a sufficient condition, by the standard Ratio Test, for convergence of this coefficient series (the numerical series standing as the coefficient of the matrix A-I above), is that
| trace(A-I) | < 1.
If λ1 and λ2 are the eigenvalues of the original 2X2 matrix A, this convergence condition is equivalent to the condition
| λ1 + λ2 – 2 | < 1.
For a Markov matrix, since one eigenvalue (see below for the 2X2 case) is always λ1 = 1, this convergence condition is equivalent to the condition | λ2 – 1 | < 1. Thus we have -1 < λ2 – 1 < 1, or 0 < λ2 < 2. We will need this result in a moment.
In particular, for a Markov matrix
the characteristic polynomial can be shown to be (λ- 1)[λ – (p + q -1)] so that its zeros, the eigenvalues of A, are λ1 = 1 and λ2 = p + q -1. If we assume, as is customary for a Markov transition matrix, that 0 < p < 1 and 0 < q < 1 so that 0 < p + q < 2, then it follows that p + q – 1 < 1, that is, λ2 < 1. This, together with our earlier result 0 < λ2 < 2, implies
0 < λ2 < 1,
which is thus a necessary condition
for convergence. [A “non-unity” eigenvalue for a Markov matrix is always less than 1 in
absolute value, but the point here is that it might be negative. Note that even some Markov matrices do not
satisfy the eigenvalue condition 0 < λ2
< 1 precisely because λ2 , though absolutely
less than 1, may indeed be negative. For
example, for the 2X2 Markov matrix with p = q = 0.1, the “secondary” eigenvalue λ2 is -0.8 and the absolute
value of the trace of A-I is 1.8 > 1 so that there is no
Under the
assumptions that 0 < p < 1 and 0 < q < 1 (and λ1 =
1) the convergence condition 0 < λ2 < 1 is in fact
equivalent to the “trace” convergence condition stated. We have already seen that the “trace”
condition implies the λ2 condition. Conversely, the latter condition 0 <
λ2 < 1 implies 0 < λ2 < 2 which
implies (since λ1 – 1 = 0 under the assumptions about the
matrix A) 0 < λ1 + λ2 – 1 < 2 or -1 <
λ1 + λ2 – 2 < 1 or | trace(A-I)
| < 1. (Convergence for the matrix
used in the worked-out
Had we used
the eigenvalue method (which requires far less
computation than the in exact form, we
would have begun by determining that the eigenvalues
of A are λ1 =
1 with corresponding eigenvector v1 = (5,2) and λ2 = 0.3 with
corresponding eigenvector v2
= (1,-1). If one writes
then the relation A1/2vi = λi1/2vi (i= 1, 2) gives
and
implying, by matrix multiplication,
5a + 2b = 5, 5c + 2d = 2, a – b = , and c – d =
. Solving these
equations for a, b, c and d gives the desired result in exact form:
which, with entries shown to nine decimal places, is
to which the
Unlike the