Diagonalization by a Similarity Transformation

Definition.

A matrix \(\ \boldsymbol{A}\in M_n(C)\ \) is diagonalizable by a similarity transformation if there exists an invertible matrix \(\ \boldsymbol{P}\in M_n(C)\ \) such that

(1)\[\boldsymbol{P}^{-1}\boldsymbol{A}\,\boldsymbol{P}\ =\ \boldsymbol{D},\]

where \(\ \boldsymbol{D}\in M_n(C)\ \) is a diagonal matrix.

We say that \(\ \boldsymbol{P}\ \) is a diagonalizing matrix or that it diagonalizes the matrix \(\ \boldsymbol{A}.\)

Lemma.

Consider matrices \(\ \boldsymbol{A},\,\boldsymbol{P}\in M_n(C).\) Columns \(\ \boldsymbol{X}_1,\ \boldsymbol{X}_2,\ldots, \boldsymbol{X}_n\ \) of the matrix \(\ \boldsymbol{P}\ \) are eigenvectors of the matrix \(\ \boldsymbol{A}:\)

(2)\[\boldsymbol{A}\,\boldsymbol{X}_1\,=\,\lambda_1\,\boldsymbol{X}_1,\quad \boldsymbol{A}\,\boldsymbol{X}_2\,=\,\lambda_2\,\boldsymbol{X}_2,\quad \ldots\quad \boldsymbol{A}\,\boldsymbol{X}_n\,=\,\lambda_n\,\boldsymbol{X}_n\]

if and only if

(3)\[\boldsymbol{A}\,\boldsymbol{P}\,=\,\boldsymbol{P}\,\boldsymbol{D},\]

where \(\ \boldsymbol{D}\,=\, \text{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n)\,.\)

Indeed, according to the column rule of matrix multiplication:

\[\begin{split}\begin{array}{rl} \boldsymbol{A}\,\boldsymbol{P} \!\! & =\ \ \boldsymbol{A}\ \,[\ \boldsymbol{X}_1\,|\ \boldsymbol{X}_2\,| \ \ldots\,|\ \boldsymbol{X}_n\,]\ \ = \\[6pt] & =\ \ [\ \boldsymbol{A}\,\boldsymbol{X}_1\,|\ \boldsymbol{A}\,\boldsymbol{X}_2\,|\ \ldots\,|\ \boldsymbol{A}\,\boldsymbol{X}_n\,] \\[10pt] \boldsymbol{P}\,\boldsymbol{D} \!\! & =\ \ [\ \lambda_1\,\boldsymbol{X}_1\,|\ \lambda_2\,\boldsymbol{X}_2\,|\ \ldots\,|\ \lambda_n\,\boldsymbol{X}_n\,] , \end{array}\end{split}\]

and thus the conditions (2) and (3) equivalent.

Theorem 7.

A matrix \(\ \boldsymbol{A}\ \) is diagonalizable by a similarity transformation (1) if and only if the space \(\,C^n\,\) has a basis \(\,\mathcal{B} = (\boldsymbol{X}_1,\, \boldsymbol{X}_2,\,\ldots,\,\boldsymbol{X}_n)\ \) consisting of eigenvectors of the matrix \(\ \boldsymbol{A}:\)

\[\boldsymbol{A}\,\boldsymbol{X}_1\,=\,\lambda_1\,\boldsymbol{X}_1,\quad \boldsymbol{A}\,\boldsymbol{X}_2\,=\,\lambda_2\,\boldsymbol{X}_2,\quad \ldots,\quad \boldsymbol{A}\,\boldsymbol{X}_n\,=\,\lambda_n\,\boldsymbol{X}_n\,,\quad \lambda_1,\lambda_2,\ldots,\lambda_n\in C\,.\]

Then the matrix \(\ \boldsymbol{P}\,=\, [\ \boldsymbol{X}_1\,|\,\boldsymbol{X}_2\,|\,\ldots\,|\,\boldsymbol{X}_n\ ],\ \) whose columns are vectors from the basis \(\,\mathcal{B}\,,\ \) diagonalizes the matrix \(\,\boldsymbol{A}.\)

Proof.

A matrix \(\ \boldsymbol{A}\ \) is diagonalizable by a similarity transformation if there exists an invertible matrix \(\ \boldsymbol{P}\,\equiv\, [\ \boldsymbol{X}_1\,|\ \boldsymbol{X}_2\,|\ \ldots\,|\ \boldsymbol{X}_n\,]\ \) such that

\[\boldsymbol{P}^{-1}\boldsymbol{A}\,\boldsymbol{P}\ =\ \boldsymbol{D},\]

where \(\ \boldsymbol{D}\ \) is a diagonal matrix: \(\ \boldsymbol{D}\,=\,\text{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n).\ \) This is equivalent to the conditions

\[\boldsymbol{A}\,\boldsymbol{P}\,=\,\boldsymbol{P}\,\boldsymbol{D} \quad\text{and}\quad \det{\boldsymbol{P}}\neq 0.\]

The condition \(\ \det{\boldsymbol{P}}\neq 0\ \) means that \(\ \mathcal{B} = (\boldsymbol{X}_1,\, \boldsymbol{X}_2,\,\ldots,\,\boldsymbol{X}_n)\ \) comprises a linearly independent set of vectors of the space \(\ C^n.\ \) Moreover, by the above Lemma:

\[\boldsymbol{A}\,\boldsymbol{X}_1\,=\,\lambda_1\,\boldsymbol{X}_1,\quad \boldsymbol{A}\,\boldsymbol{X}_2\,=\,\lambda_2\,\boldsymbol{X}_2,\quad \ldots\quad \boldsymbol{A}\,\boldsymbol{X}_n\,=\,\lambda_n\,\boldsymbol{X}_n .\]

In an \(\ n\)-dimensional vector space every set of \(\,n\,\) linearly independent vectors is a basis. Therefore, since \(\ \dim{C^n}=n,\ \) so \(\,\mathcal{B}\ \) is a basis of the space \(\ C^n;\ \) it is the basis consisting of eigenvectors of the matrix \(\,\boldsymbol{A}.\)

On the other hand, if eigenvectors \(\ \boldsymbol{X}_1,\,\boldsymbol{X}_2,\, \ldots,\,\boldsymbol{X}_n\ \) of the matrix \(\,\boldsymbol{A}\in M_n(C)\,\) span the space \(\,C^n,\ \) then the matrix \(\,\boldsymbol{P}\,\) whose columns are the basis vectors diagonalizes the matrix \(\,\boldsymbol{A}.\)

Comments and corollaries.

1.) Every matrix \(\,\boldsymbol{A}\in M_n(C)\,\) has at least one eigenvalue \(\,\lambda\,\) and the associated eigenvector \(\,\boldsymbol{X}.\ \) Hence, because the equation (2) does not require that the eigenvalues \(\,\lambda_i\ \) and \(\,\) the associated eigenvectors \(\,\boldsymbol{X}_i\ \) are distinct, there always exists a matrix \(\,\boldsymbol{P}\ \) such that the equation (3) holds. In particular, one may take

\[\lambda_1\,=\,\lambda_2=\,\ldots\,\lambda_n\,=\lambda,\quad \boldsymbol{X}_1\,=\,\boldsymbol{X}_2\,=\,\ldots\boldsymbol{X}_n\,=\, \boldsymbol{X}.\]

Then \(\,\boldsymbol{A}\boldsymbol{P}= \boldsymbol{P}\boldsymbol{D}=\lambda\,\boldsymbol{P},\ \) but the matrix \(\ \boldsymbol{P}\ \) is not invertible and thus the relation does not hold (1).

2.) The formula \(\ \boldsymbol{D}\,=\, \boldsymbol{P}^{-1}\boldsymbol{A}\,\boldsymbol{P}\ \) may be interpreted in terms of transformation of a matrix of a linear operator under a change of basis. Consider the space \(\,C^n\ \) with the canonical basis \(\ \mathcal{E}\,=\,(\boldsymbol{e}_1,\boldsymbol{e}_2,\ldots\, \boldsymbol{e}_n).\ \) Let \(\boldsymbol{A}\ \) be the matrix of a linear operator \(F\in \text{End}(C^n)\ \) defined by \(\ F(\boldsymbol{x})\,:\,=\,\boldsymbol{A}\boldsymbol{x},\ \) \(\,\boldsymbol{x}\in C^n.\ \) If eigenvectors \(\ \boldsymbol{X}_1,\boldsymbol{X}_2,\ldots,\boldsymbol{X}_n\ \) of the operator \(\,F\,\) are linearly independent, then the matrix \(\ \boldsymbol{P}\,=\, [\ \boldsymbol{X}_1\,|\,\boldsymbol{X}_2\,|\,\ldots\,|\,\boldsymbol{X}_n\ ]\ \) is the transition matrix from the canonical basis \(\,\mathcal{E}\,\) to the basis \(\,\mathcal{B}\,=\, (\boldsymbol{X}_1,\boldsymbol{X}_2,\ldots\,\boldsymbol{X}_n)\ \) consisting of the eigenvectors.

Hence, \(\boldsymbol{D}\ \) is a matrix of the operator \(\,F\ \) in the basis \(\,\mathcal{B}\ \) consisting of its eigenvectors. As one should expect, this is a diagonal matrix with the eigenvalues of \(\,F\ \) on the diagonal.

3.) We know already that the eigenevectors of a linear operator which are associated to different eigenvalues are linearly independent.

Corollary. If a matrix \(\,\boldsymbol{A}\in M_n(C)\ \) has \(\,n\,\) distinct eigenvalues, then there exists a similarity transformation which diagonalizes this matrix.

Indeed, if columns of the matrix \(\,\boldsymbol{P}\,\) are eigenvectors of the matrix \(\,\boldsymbol{A}\,\) which are associated with distinct eigenvalues, then the matrix \(\,\boldsymbol{P}\,\) is non-degenerate: \(\,\det{\boldsymbol{P}}\neq 0,\ \) and thus invertible.

4.) Eigenvectors of a normal operator which are associated with distinct eigenvalues comprise an orthogonal system, and after normalization - an orthonormal system. A matrix whose columns comprise an orthonormal system is unitary.

Corollary. Let \(\,\boldsymbol{A}\in M_n(C)\ \) be a normal (e.g. Hermitian or unitary) matrix. If \(\,\boldsymbol{A}\ \) has \(\,n\,\) distinct eigenvalues, then there exists a unitary similarity transformation which diagonalizes this matrix (a diagonalizing matrix \(\,\boldsymbol{P}\ \) is unitary: \(\ \boldsymbol{P}^+\boldsymbol{P}=\boldsymbol{I}_n).\)

Remark. A normal matrix does not have to have \(\,n\,\) distinct eigenvectors to be diagonalizable. Namely, one can prove a more general

Theorem 8.

A matrix \(\,\boldsymbol{A}\in M_n(C)\ \) is diagonalizable by a unitary similarity transformation if and only if it is normal.

Application to real matrices.

For a real matrix \(\,\boldsymbol{A}:\ \) \(\,\boldsymbol{A}\in M_n(R),\ \) we have \(\,\boldsymbol{A}^+=\boldsymbol{A}^T.\ \) Therefore

\[\boldsymbol{A}^+=\boldsymbol{A} \quad\Leftrightarrow\quad \boldsymbol{A}^T=\boldsymbol{A}\]

(a real Hermitian matrix is symmetric), and

\[\boldsymbol{A}^+\boldsymbol{A}=\boldsymbol{I}_n \quad\Leftrightarrow\quad \boldsymbol{A}^T\boldsymbol{A}=\boldsymbol{I}_n\]

(a real unitary matrix is orthogonal).

Theorem 9.

Every real symmetric or orthogonal matrix is diagonalizable by a unitary similarity transformation.

Eigenvalues, and thus also eigenvectors, of a real symmetric matrix are real. Hence, a unitary diagonalizing matrix is a real orthogonal matrix.

Corollary. Every real symmetric matrix is diagonalizable by a real orthogonal similarity transformation.

In comparison to the previous case, eigenvalues of a real orthogonal matrix (and thus also its eigenvectors) may be complex and not real. Then the unitary diagonalizing matrix will be also complex and not real.

Theorem 10.

If a matrix \(\ \boldsymbol{A}\ \) is diagonalizable by a similarity transformation, then an algebraic multiplicity of every eigenvalue is equal to the geometric multiplicity.

Proof. \(\ \) If a transformation \(\ \boldsymbol{A}\ \rightarrow\ \boldsymbol{P}^{-1}\boldsymbol{A}\,\boldsymbol{P}\ \equiv\boldsymbol{D}\ \) diagonalizes the matrix \(\ \boldsymbol{A},\ \) then \(\ \boldsymbol{D}\,= \text{diag}(\lambda_1,\,\lambda_2,\,\ldots,\,\lambda_k),\ \) where \(\ \lambda_1,\lambda_2,\ldots,\lambda_k\ \) are eigenvalues of the matrix \(\,\boldsymbol{A}.\ \) The number with which \(\,\lambda_i\,\) occurs on a diagonal of the matrix \(\ \boldsymbol{D}\ \) is equal to both an algebraic and geometric multiplicity of this eigenvalue.