Donald R. Burleson, Ph.D.

Copyright © 2005 by Donald R. Burleson.  All rights reserved.


To return to the previous eigenvalues article, click here.

To return to the “About the Author” page,  click here.



            For a square (nXn) matrix A with eigenvalues λi and corresponding eigenvectors vi, one may use (see “Eigenvalues and Non-Integer Powers of a Square Matrix”) the relation

Axvi = λxvi


with x = ½ to define the square root of the matrix A by specifying that   must be a matrix satisfying the relation


A1/2vi = λ1/2vi


for i= 1, 2, …, n.  (To see a detailed discussion of this essentially simple computational procedure, and an important application, click here.)


            To corroborate the reasonableness of this eigenvalue approach, one may also compute  by making use of an analogy with the standard Taylor series expansion of a function of a real or complex variable, where the function f(x) is expanded about a center x = a:




Taking a = 1 and taking derivatives, one may expand f(x) = x1/2 as the series representation




with radius of convergence ρ = 1.


            The analogy, for a function of a square matrix A rather than a real or complex variable x, with a view to computing the square root of A asymptotically, is to employ  the identity matrix I in the role of the number 1 (the center of the series) and to treat the powers (A-I)n in terms of matrix multiplication:



under an appropriate convergence condition, which as we shall see can be readily determined, at least in the case of a Markov matrix.


            For example, consider the 2X2 Markov state-transition matrix




            When we compute powers of A-I we find




and in general



where, it is easy to show, for a 2X2 Markov matrix A with eigenvalues λ1 and λ2, the factor (-0.7) in general represents trace(A-I) (the sum of the eigenvalues of A-I), which is λ1 + λ2 – 2.  (Analogous things are true for Markov matrices of higher order.)


            When one writes (-0.7)n-1(A-I) for (A-I)n in the matrix-valued Taylor series for A1/2, one obtains



which, if one follows the coefficient series out for the first fourteen terms (the convergence being rather slow) produces the result





This is of course an approximation, which improves upon adding further terms of the series; the result shown is good to within one-ten-thousandth.


It turns out that a sufficient condition, by the standard Ratio Test, for convergence of this coefficient series (the numerical series standing as the coefficient of the matrix A-I above), is that


| trace(A-I) |  < 1.


If λ1 and λ2 are the eigenvalues of the original 2X2 matrix A, this convergence condition is equivalent to the condition


| λ1 +  λ2 – 2 | < 1.


            For a Markov matrix, since one eigenvalue (see below for the 2X2 case) is always λ1 = 1, this convergence condition is equivalent to the condition | λ2 – 1 | < 1.  Thus we have -1 < λ2 – 1 < 1, or 0 < λ2 < 2.  We will need this result in a moment.


In particular, for a Markov matrix



the characteristic polynomial can be shown to be (λ- 1)[λ – (p + q -1)] so that its zeros, the eigenvalues of A, are λ1 = 1 and λ2 = p + q -1.  If we assume, as is customary for a Markov transition matrix, that 0 < p < 1 and 0 < q < 1 so that 0 < p + q < 2, then it follows that p + q – 1 < 1, that is, λ2 < 1.  This, together with our earlier result 0 < λ2 < 2, implies


0 < λ2 < 1,


which is thus a necessary condition for convergence.  [A “non-unity” eigenvalue for a Markov matrix is always less than 1 in absolute value, but the point here is that it might be negative.  Note that even some Markov matrices do not satisfy the eigenvalue condition 0 < λ2 < 1 precisely because λ2 , though absolutely less than 1, may indeed be negative.  For example, for the 2X2 Markov matrix with p = q = 0.1, the “secondary” eigenvalue λ2 is -0.8 and the absolute value of the trace of A-I is 1.8 > 1 so that there is no Taylor convergence.]


            Under the assumptions that 0 < p < 1 and 0 < q < 1 (and λ1 = 1) the convergence condition 0 < λ2 < 1 is in fact equivalent to the “trace” convergence condition stated.  We have already seen that the “trace” condition implies the λ2 condition.  Conversely, the latter condition 0 < λ2 < 1 implies 0 < λ2 < 2 which implies (since λ1 – 1 = 0 under the assumptions about the matrix A) 0 < λ1 + λ2 – 1 < 2 or -1 < λ1 + λ2 – 2 < 1 or | trace(A-I) | < 1.  (Convergence for the matrix used in the worked-out Taylor series example was assured by virtue of the fact that λ2 = 0.3 so that we had 0 < λ2 < 1.  It turns out that this condition on “non-unity” eigenvalues applies to Taylor series convergence for larger matrices as well.)


            Had we used the eigenvalue method (which requires far less computation than the Taylor series) to find  in exact form, we would have begun by determining that the eigenvalues of A are λ1 =  1 with corresponding eigenvector v1 = (5,2) and λ2 = 0.3 with corresponding eigenvector v2 = (1,-1).  If one writes



then the relation A1/2vi = λi1/2vi (i= 1, 2) gives




implying, by matrix multiplication, 5a + 2b = 5, 5c + 2d = 2, a – b = , and c – d = .  Solving these equations for a, b, c and d gives the desired result in exact form:



which, with entries shown to nine decimal places, is



to which the Taylor series result closely corresponds, and even more closely corresponds when one adds more terms.


            Unlike the Taylor series approach, the eigenvalue method does not encounter the problem of establishing conditions of convergence for an infinite series.  The question of matrix-valued Taylor series convergence, however, as we have seen, does involve conditions on the eigenvalues.