THE SQUARE ROOT OF A MATRIX:

COMPUTING THE SQUARE ROOT OF A MARKOV MATRIX:

EIGENVALUES AND THE TAYLOR SERIES

Donald R. Burleson, Ph.D.

To return to the previous eigenvalues article, click here.

To return to the “About the Author” page, click here.

For a square (nXn) matrix A with eigenvalues λ_i and corresponding eigenvectors v_i, one may use (see “Eigenvalues and Non-Integer Powers of a Square Matrix”) the relation

A^xv_i = λ^xv_i

with x = ½ to define the square root of the matrix A by specifying that must be a matrix satisfying the relation

A^1/2v_i = λ^1/2v_i

for i= 1, 2, …, n. (To see a detailed discussion of this essentially simple computational procedure, and an important application, click here.)

To corroborate the reasonableness of this eigenvalue approach, one may also compute by making use of an analogy with the standard Taylor series expansion of a function of a real or complex variable, where the function f(x) is expanded about a center x = a:

Taking a = 1 and taking derivatives, one may expand f(x) = x^1/2 as the series representation

with radius of convergence ρ = 1.

The analogy, for a function of a square matrix A rather than a real or complex variable x, with a view to computing the square root of A asymptotically, is to employ the identity matrix I in the role of the number 1 (the center of the series) and to treat the powers (A-I)ⁿ in terms of matrix multiplication:

under an appropriate convergence condition, which as we shall see can be readily determined, at least in the case of a Markov matrix.

For example, consider the 2X2 Markov state-transition matrix

with

When we compute powers of A-I we find

and in general

where, it is easy to show, for a 2X2 Markov matrix A with eigenvalues λ₁ and λ₂, the factor (-0.7) in general represents trace(A-I) (the sum of the eigenvalues of A-I), which is λ₁ + λ₂ – 2. (Analogous things are true for Markov matrices of higher order.)

When one writes (-0.7)ⁿ^-1(A-I) for (A-I)ⁿ in the matrix-valued Taylor series for A^1/2, one obtains

which, if one follows the coefficient series out for the first fourteen terms (the convergence being rather slow) produces the result

This is of course an approximation, which improves upon adding further terms of the series; the result shown is good to within one-ten-thousandth.

It turns out that a sufficient condition, by the standard Ratio Test, for convergence of this coefficient series (the numerical series standing as the coefficient of the matrix A-I above), is that

| trace(A-I) | < 1.

If λ₁ and λ₂ are the eigenvalues of the original 2X2 matrix A, this convergence condition is equivalent to the condition

| λ₁ + λ₂ – 2 | < 1.

For a Markov matrix, since one eigenvalue (see below for the 2X2 case) is always λ₁ = 1, this convergence condition is equivalent to the condition | λ₂ – 1 | < 1. Thus we have -1 < λ₂ – 1 < 1, or 0 < λ₂ < 2. We will need this result in a moment.

In particular, for a Markov matrix

the characteristic polynomial can be shown to be (λ- 1)[λ – (p + q -1)] so that its zeros, the eigenvalues of A, are λ₁ = 1 and λ₂ = p + q -1. If we assume, as is customary for a Markov transition matrix, that 0 < p < 1 and 0 < q < 1 so that 0 < p + q < 2, then it follows that p + q – 1 < 1, that is, λ₂ < 1. This, together with our earlier result 0 < λ₂ < 2, implies

0 < λ₂ < 1,

which is thus a necessary condition for convergence. [A “non-unity” eigenvalue for a Markov matrix is always less than 1 in absolute value, but the point here is that it might be negative. Note that even some Markov matrices do not satisfy the eigenvalue condition 0 < λ₂ < 1 precisely because λ₂ , though absolutely less than 1, may indeed be negative. For example, for the 2X2 Markov matrix with p = q = 0.1, the “secondary” eigenvalue λ₂ is -0.8 and the absolute value of the trace of A-I is 1.8 > 1 so that there is no Taylor convergence.]

Under the assumptions that 0 < p < 1 and 0 < q < 1 (and λ₁ = 1) the convergence condition 0 < λ₂ < 1 is in fact equivalent to the “trace” convergence condition stated. We have already seen that the “trace” condition implies the λ₂ condition. Conversely, the latter condition 0 < λ₂ < 1 implies 0 < λ₂ < 2 which implies (since λ₁ – 1 = 0 under the assumptions about the matrix A) 0 < λ₁ + λ₂ – 1 < 2 or -1 < λ₁ + λ₂ – 2 < 1 or | trace(A-I) | < 1. (Convergence for the matrix used in the worked-out Taylor series example was assured by virtue of the fact that λ₂ = 0.3 so that we had 0 < λ₂ < 1. It turns out that this condition on “non-unity” eigenvalues applies to Taylor series convergence for larger matrices as well.)

Had we used the eigenvalue method (which requires far less computation than the Taylor series) to find in exact form, we would have begun by determining that the eigenvalues of A are λ₁ = 1 with corresponding eigenvector v₁ = (5,2) and λ₂ = 0.3 with corresponding eigenvector v₂ = (1,-1). If one writes

then the relation A^1/2v_i = λ_i^1/2v_i (i= 1, 2) gives

and

implying, by matrix multiplication, 5a + 2b = 5, 5c + 2d = 2, a – b = , and c – d = . Solving these equations for a, b, c and d gives the desired result in exact form:

which, with entries shown to nine decimal places, is

to which the Taylor series result closely corresponds, and even more closely corresponds when one adds more terms.

Unlike the Taylor series approach, the eigenvalue method does not encounter the problem of establishing conditions of convergence for an infinite series. The question of matrix-valued Taylor series convergence, however, as we have seen, does involve conditions on the eigenvalues.