ON NON-INTEGER POWERS OF A SQUARE MATRIX
Donald R. Burleson,
Ph.D.
Copyright (c) 2005 by Donald R. Burleson. All rights
reserved.
(TO SEE ANOTHER, RELATED, ARTICLE, CLICK HERE.)
(To return to the "About the
Author" page, click here.)
Given a square (n-by-n) matrix A, the zeros li of the nth-degree polynomial
det(A - lI) (where I represents the identity matrix of the same size as A)
are the eigenvalues of A, and to each
eigenvalue li there corresponds an eigenvector vi (a vector unique up to a
scalar multiple), such that for 1 < i < n,
Avi = livi .
That is, the nature of an eigenvalue and its corresponding eigenvector is that the effect of multiplying the matrix A onto the vector is the same as multiplying the number li onto the vector. Also,
A2vi = A(Avi) = A(livi) = li(Avi) = li(liA) = li2A
and in fact one may easily show by induction on n that
Anvi = linvi
for all positive integers n. Thus we have, in this context, the very economical substitution of a mere power of the number li for the more laboriously computed power An of the given matrix, having the same multiplicative effect on the eigenvector—and potentially, by extension, the same multiplicative effect on any vector in the same space, when that vector is written in terms of an eigenbasis {vi} : for w = Σ kivi , in terms of appropriate basis-representation scalar coefficients ki we would have Anw = Σ kiλinvi since A represents a linear transformation.
It is natural to consider generalizing the relation Anvi = linvi to non-integer values of the exponent n. [Remark: By the method about to be described, for a nonsingular matrix A if one puts n = -1 one simply produces the multiplicative inverse matrix A -1 by a somewhat circuitous way of computing it; all other negative-integer values of n are of course powers of A -1.]
Let’s look at the case n = 1/2, so that the relation in question becomes
A1/2=
li1/2 vi .
The “square root” of the matrix A should reasonably be defined as a matrix M such that MM = A and such that M = A1/2 satisfies the above relation for any given eigenvalue and corresponding eigenvector. And indeed it turns out that the eigenvalues of A provide a way of computing it.
Take for example the 2X2 matrix
A =
The eigenvalues li are the zeros of the characteristic polynomial of A:
det(A - lI)
= det
= (3-l)(2-l) -
2(1) = l2
- 5l+ 4
= (l-4)(l-1)
so that the spectrum of A, i. e. the set of all eigenvalues of A, is { l1 = 4, l2 = 1). Let an eigenvector corresponding to l1 be (x,y). By putting
so that with 3x + y = 4x (and equivalently 2x + 2y = 4y), we have y = x, and we may most simply take x = y = 1 so that the corresponding eigenvector (x,y) is v1 = (1,1).
In like manner for the other eigenvalue l2 we may derive the eigenvector v2 = (1,-2).
If we denote the desired square root matrix A1/2 as with a
view to finding the elements a, b, c, and d, we have, from the relation
A1/2v1 = l11/2v1:
which by matrix multiplication implies a + b = 2 and c + d = 2. Similarly, from A1/2v2 = l21/2v2 we have
which implies a - 2b = 1 and c - 2d = -2. Solving the a,b equations together and the c,d equations together gives a = 5/3, b = 1/3, c = 2/3, and d = 4/3, so that the desired square root matrix is
As a check, since we should have M2 = A, we may multiply:
Different sorts of eigenvalues may produce different sorts of results. (For present purposes we will look only at scenarios with distinct eigenvalues, though similar results may be pursued in other cases.) For example, the matrix
has eigenvalue spectrum { l1 = 4, l2 = -1 } with corresponding eigenvectors v1 = (1,-2) and v2 = (1,3), and since one of the eigenvalues is negative, with l21/2 = (-1)1/2 = i, the square root matrix turns out to have complex entries when we work it out. One obtains
which, as one may verify, checks
upon multiplication:
Naturally the same process applies to larger square matrices. E.g., for C =
,
which (since it’s a triangular matrix) has eigenvalues equal to the diagonal elements 1, 9, and 4, we have
(The only limitation here, with larger matrices, is the possibility of finding the zeros of the characteristic polynomial; for degree n > 5 they can in general only be approximated.)
One may of course pursue higher-index “roots” of a square matrix A, e.g.
etc. by
working with the appropriate fractional power. For example, with n = 1/3, from the
relation A1/3vi = li1/3vi it turns out that
which checks as
In a similar fashion we may compute noninteger square-matrix powers in general, as they only involve raising an eigenvalue to the powers in question and following the procedure shown.
Remark: By using the identity (where i denotes the imaginary unit) li = (e ln(λ))i = ei ln(λ) = cos[ln(λ)] + i sin[ln(λ)] (where if λ < 0 we may use the principle complex value of the logarithm, i.e. ln(|λ|) + πi) one may extend the determination of Az to complex powers: Az = Ax+iy = Ax(Ay)i . In this connection it is interesting to note that for a nonsingular square matrix A, if we use the indicated method to raise A to the power i and then raise the resulting matrix Ai to the power i producing (Ai)i = A – 1, all in terms of principal complex value, the result is the ordinary inverse of A. Since the action of raising the nonsingular matrix A to the power i twice in succession has the effect of producing A – 1, we may call the matrix Ai the semi-inverse of A. For purposes of raising the eigenvalues to the power i, it is helpful to recall that Euler’s formula imples 1i = e-2π and (-1)i = e-π , in principal value.
But let us look at an important application of the eigenvalue-based method of raising a square matrix A to real but non-integer powers.
APPLICATION:
Computing a market-share transition matrix from knowledge of
cumulative market-share changes over two or more transition periods
Consider, by way of quick illustration, a simple economic scenario in which there are two brands A and B of a certain product. Suppose that over any given “transition” period (period in which buyers may or may not switch brands, e.g. possibly a week, since many people shop weekly) there is a constant pattern of consumer “defection” or non-defection, defined by the following column-stochastic transition matrix T:
This means that, over one transition period, 70% of A-users will stay put with Brand A while 30% will defect to Brand B; and of B-users, 20% will defect to A and 80% will stick with B. (The first column is the “from A” column; the second column is the “from B” column; the first row is the “to A” row; the second row is the “to B” row.) If the current market share configuration is 81% for A and 19% for B, the matrix multiplication
shows that after one transition period the market shares will be 60.5% for A and 39.5% for B, an improvement for B because B’s retention rate is better than A’s. (Further transitions will show further improvement for B, but only up to a point. It can be shown that if the transition pattern T remains steady over sufficient time, the market share sequence (Markov chain) approaches a kind of fixed market share vector (“steady-state” vector) of (0.40, 0.60). I.e., eventually A will have 40% of the market and B will have 60%. A’s situation will not deteriorate beyond that point, and B’s situation will not further improve.)
Now—suppose the matrix
is known to describe the market share changes after two transition periods. (I.e., we have empirically observed that by final tally after two such periods, 55% of “going in” A-users have stuck with A while 45% have defected to B, and 30% of “going-in” B users have defected to A with 70% staying with B.) Suppose also that the basic transition matrix T describing the market share changes over one typical transition period is unknown, and this is what we need to find. (Among other things, it’s useful to recover T if one plans to project market share patterns over several further transition periods.)
Since M describes the total effect of two standard transitions T, we have
TT = M and we need to compute T = , which by the above-illustrated eigenvalue method turns out to be
Thus from knowing the cumulative changes over two transition periods, we have been able to recover the transition structure that prevails over each single transition period, always assuming this structure to be constant for the duration in question. If the recovered matrix structure T is indeed constant, it becomes a predictor of the effects over the next transition period, as well as subsequent ones.
Similarly, if we knew that a matrix M described cumulative market share
changes over three transition
periods, we could recover the basic one-period transition matrix by computing
and
similarly for higher-order iterations.
In fact, if we knew, for example, that the market share changes after five periodic iterations were described by the matrix
and if for some reason we needed to find out what matrix describes the state of things at the end of the third of those five transition periods (where, again, the basic transition matrix T is unknown), we could use the above eigenvalue method to compute
finding that by the end of the third (out of five) transition periods, brand A had a 47.5% brand retention to B’s 65.0%. In this scenario if the initial market shares are 81% (A) and 19% (B), we could recover the fact that after three transitions, the market share configuration must have been
and this from knowing only the “end” conditions. Clearly, in this manner, with a fixed transition structure T, whether initially known or not, we may recover the accumulated effects of market share changes at any point in the chain. And of course we may obtain the original basic transition matrix T in similar fashion.
Obviously there are similar applications in any similar Markov-chain situation.