0%

Linear Algebra and Matrix Analysis(线性代数和矩阵分析)

linear-algebra

Solving Linear Equations

Rules:

  1. (AB)1=B1A1(AB)^{-1} = B^{-1}A^{-1}
  2. A1A=I=AA1A^{-1}A = I = AA^{-1}
  3. (AB)x=A(Bx)(AB)\mathbf x = A(B\mathbf x)
  4. A=LU=(lower triangular)(upper triangular)=LDUA = LU = (\text{lower triangular}) (\text{upper triangular}) = LDU

A = L U is “unsymmetric” because U has the pivots on its diagonal where L has l’s. This is easy to change. Divide U by a diagonal matrix D that contains the pivots. That leaves a new triangular matrix with l’s on the diagonal:
Split U into [d1d2dn][1u12/d1u13/d1.1u23/d2.1]\begin{bmatrix}d_1 & & & \\ & d_2 & & \\ & & \ddots & \\ & & & d_n \end{bmatrix}\begin{bmatrix} 1 & u_{12}/d_1 & u_{13}/d_1 & . \\ & 1 & u_{23}/d_2 & . \\ & & \ddots & \vdots \\ & & & 1\end{bmatrix}

  1. (AB)T=BTAT(AB)^T = B^TA^T

Vector Spaces and Subspaces

  1. A real vector space is a set of “vectors” together with rules for vector addition and for multiplication by real numbers.
  2. A subspace of a vector space is a set of vectors (including 0) that satisfies two requirements: If v\mathbf v and w\mathbf w are vectors in the subspace and c is any scalar, then
  1. v\mathbf v + w\mathbf w is in the subspace.
  2. cvc\mathbf v is in the subspace.
  1. A subspace containing v and w must contain all linear combinations cvc\mathbf v + dwd\mathbf w.
  2. The column space consists of all linear combinations of the columns .The combinations are all possible vectors AxA\mathbf x. They fill the column space C(A)\mathbf C(A).
    For example AxA\mathbf x is [104323][x1x2]\begin{bmatrix} 1 & 0 \\ 4 & 3 \\ 2 & 3\end{bmatrix}\begin{bmatrix}x_1 \\ x_2\end{bmatrix} which is x1[142]+x2[033]x_1\begin{bmatrix} 1\\ 4\\ 2\end{bmatrix} + x_2\begin{bmatrix} 0 \\3 \\ 3\end{bmatrix}

This column space is crucial., To solve Ax=bA\mathbf x = \mathbf b is to express b\mathbf b as a combination ofthe columns.

The system Ax=bA\mathbf x = \mathbf b is solvable if and only if b\mathbf b is in the column space of A.

  1. The nullspace N(A)N(A) in RnR^n contains all solutions x\mathbf x to Ax=0A\mathbf x = 0. This include x=0\mathbf x = 0
  2. Elimination (from AA to UU to RR) does not change the nullspace: N(A)=N(U)=N(R)N(A) = N(U)= N(R).
  3. Suppose Ax=0A\mathbf x = 0 has more unknowns than equations (n > m, more columns than rows). There must be at least one free column. Then Ax = 0 has nonzero solutions.

A short wide matrix (n>m)(n > m) always has nonzero vectors in its nullspace. There must be at least nmn - m free variables, since the number of pivots cannot exceed m. (The matrix only has m rows, and a row never has two pivots.) Of course a row might have no pivot­ which means an extra free variable. But here is the point: When there is a free variable, it can be set to 1. Then the equation Ax=0A\mathbf x = 0 has at least a line of nonzero solutions.

  1. The nullspace is a subspace. Its “dimension” is the number of free variables. This central idea-the dimension of a subspace
  2. The rank of AA is the number of pivots. This number is rr. Every ''free column" is a combination of earlier pivot columns. It is the special solutions s\mathbf s that tell us those combinations.

Number of pivots=number of nonzero rows in R=rank r\text{Number of pivots} = \text{number of nonzero rows in R}= \text{rank } \mathbf r. There are nrn - r free columns.

  1. The column space of a rank one matrix is “one-dimensional”, have the special rank one form A=uvTA = \mathbf {uv}^T, which is form the rank one matrix, A=column times row=uvTA = \text{column times row} = \mathbf {uv}^T , u,v\mathbf u, \mathbf v are basis vector.
  2. Complete solution to Ax=bA\mathbf x = b is that
    x=(one particular solution xp) + (any xn in the nullspace)\mathbf x = \text{(one particular solution } \mathbf x_p \text{) + (any } \mathbf x_n \text{ in the nullspace)}

The particular solutioni solves Axp=bA\mathbf x_p = b
The nrn - r special solutions solve Axn=0A\mathbf x_n = 0
it means Axp+Axn=A(xp+xn)=b+0=bA\mathbf x_p + A\mathbf x_n = A(\mathbf x_p + \mathbf x_n) = b + 0 = b

Suppose AA is a square invertible matrix, m=n=rm = n = r. The particular solution is the one and only solution xp=A1b\mathbf x_p = A^{-1}b. There are no special solutions or free variables.
R=IR = I has no zero rows. The only vectorin the nullspace is xn=0\mathbf x_n = 0. The complete solution is x=xp+xn=A1b+0\mathbf x =\mathbf x_p + \mathbf x_n = A^{-1}b + 0.

  1. Every matrix AA with full column rank (r = n) has all these properties:
  1. All columns of AA are pivot columns.
  2. There are no free variables or special solutions.
  3. The nullspace N(A)N(A) contains only the zero vector x=0\mathbf x = 0.
  4. If Ax=bA\mathbf x = b has a solution (it might not) then it has only one solution.
  1. Every matrix AA with full row rank (r = m) has all these properties:
  1. All rows have pivots, and R has nozero rows.
  2. Ax=bA\mathbf x = b has a solution for every right side b.
  3. The column space is the whole space RmR^m
  4. There are nr=nmn - r = n - m special solutions in the nullspace of AA.

In this case with mm pivots, the rows are “linearly independent”. So the columns of ATA^T are linearly independent. The nullspace of ATA^T is the zero vector.

  1. The columns of AA are linearly independent when the only solution to Ax=0A\mathbf x = 0 is x=0x = 0. No other combination AxA\mathbf x of the columns gives the zero vector.
  2. The sequence of vectors v1,,vnv_1, \cdots , v_n is linearly independent if the only combination that gives the zero vector is 0v1+0v2++0vn0v_1 + 0v_2 + \cdots + 0v_n. that’s x1v1+x2v2++xnvn=0x_1v_1 + x_2v_2 + \cdots + x_nv_n = 0 only happens when all xx's are zero.
  3. The row space of AA is C(AT)C(A^T), it’s the column space of ATA^T
  4. The vectors v1,,vnv_1, \cdots , v_n are a basis for RnR^n exactly when they are the columns of an n×nn \times n invertible matrix. Thus RnR^n has infinitely many different bases.

The pivot columns of AA are a basis for its column space. The pivot rows of AA are a basis for its row space. So are the pivot rows of its echelon form RR.

  1. If v1,,vmv_1,\cdots, v_m and w1,,wnw_1, \cdots, w_n are both bases for the same vector space, then m=nm = n. The dimension of a space is the number of vectors in every basis.

For example
for the matrix A=[123228]A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 2 & 8\end{bmatrix} the column space C(A)C(A) and the row space C(AT)C(A^T) has a dimension 2, not 3
for the matrix B=[122238]B = \begin{bmatrix} 1 & 2\\ 2 & 2 \\ 3& 8 \end{bmatrix} the column space C(B)C(B) and the row space C(BT)C(B^T) has a dimension 2, not 3

  1. Four Fundamental Subspaces of matrix Am×nA_{m \times n}
  1. The row space is C(AT)C(A^T), a subspace of RnR^n.
  2. The column space is C(A)C(A), a subspace of RmR^m.
  3. The nullspace is N(A)N(A), a subspace of RnR^n
  4. The left nullspace is N(AT)N(A^T), a subspae of RmR^m.
    For the left nullspace we solve ATy=0A^T\mathbf y = 0(the system is n×mn\times m. it is the nullspace of ATA^T).
    The vectors y\mathbf y go on the left side of AA when the equation is written yTA=0Ty^TA = 0^T .

The row space and column space have the same dimension r=Rank(A)r = \text{Rank}(A).
N(A)N(A) and N(AT)N(A^T) have dimensions nrn - r and mrm - r, to make up the full nn and mm.

In RnR^n the row space and nullspace have dimensions rr and nrn-r
In RmR^m the column space and left nullspace have dimensions rr and mrm - r

  1. A matrix multiplies a vector: AA times x\mathbf x.
  • At the first level this is only numbers.
  • At the second level AxA\mathbf x is a combination of column vectors.
  • The third level shows subspaces. The matrix Am×nA_{m\times n}'s big picture as follow
    The Big Picture
  1. The row space is perpendicular to the nullspace.
    Every row of AA is perpendicular to every solution of Ax=0A\mathbf x = 0.
  2. The column space is perpendicular to the nullspace of ATA^T
    When bb is outside the column space—when we want to solve Ax=bA\mathbf x = b and can’t do it—then this nullspace of ATA^T comes into its own. It contains the error e=bAxe = b - A\mathbf x in the “least-squares” solution.
  1. Every rank one matrix is one column times one row A=uvTA= \mathbf u\mathbf v^T.
    Rank Two Matrices = Rank One plus Rank One

For example as follow
A=[1031174220]=[100110421][103014000]=CRA = \begin{bmatrix} 1 & 0 & 3 \\ 1&1&7 \\ 4&2&20\end{bmatrix} = \begin{bmatrix}1&0&0\\1&1&0\\4&2&1\end{bmatrix}\begin{bmatrix}1&0&3\\0&1&4\\0&0&0\end{bmatrix} = CR
The row space of RR clearly has two basis vectors v1T=[1 0 3]v_1^T = \text{[1 0 3]} and v2T=[0 1 4]v_2^T = \text{[0 1 4]}. and the column space of CC clearly has tow basic vectors u1=(1,1,4),u2=(0,1,2)u_1 = \text{(1,1,4)}, u_2 = \text{(0,1,2)}, Then
A=[u1u2u3][v1Tv2Tzero row]=u1v1T+u2v2T=(rank 1)+(rank 1)A = \begin{bmatrix}u_1&u_2&u_3\end{bmatrix} \begin{bmatrix}v_1^T\\v_2^T\\\text{zero row}\end{bmatrix} = u_1v_1^T + u_2v_2^T = (\text{rank 1}) + (\text{rank 1})

Orthogonality

Orthogonality of the Four Subspaces

  1. Orthogonality of the Four Subspaces
  1. Orthogonal vectors have vTw=0\mathbf v^T\mathbf w=0.Then v2+w2=v+w2=vw2||\mathbf v^2||+||\mathbf w^2||= ||\mathbf v + \mathbf w||^2 = ||\mathbf v - \mathbf w||^2
  2. Subspaces VV and WW are orthogonal when vTw=0\mathbf v^T\mathbf w = 0 for every v\mathbf v in VV and every w\mathbf w in WW.
  3. The row space of AA is orthogonal to the nullspace. The column space is orthogonal to N(AT)N(A^T).
  4. Row space and nullspace are orthogonal complements: Every xx in RnR^n splits into Xrow+XnullX_{row} + X_{null}·
  1. Every vector x\mathbf x in the nullspace is perpendicular to every row of AA, because Ax=0A\mathbf x = 0.
    The nullspace N(A)N(A) and the row space C(AT)C(A^T) are orthogonal subspaces of RnR^n .

Proof as follow
Ax=[row 1row m][x]=[00]A\mathbf x = \begin{bmatrix}row\space 1 \\ \vdots \\ row\space m\end{bmatrix}\begin{bmatrix}\mathbf x\end{bmatrix} = \begin{bmatrix}0\\ \vdots \\0\end{bmatrix}. It’s cleary that $\text{(row m)} \cdot x $ is zero.
Or the seccond way to proof that orthogonality for readers who like matrix shorthand. The vectors in the row space are combinations ATyA^Ty of the rows.
Take the dot product of ATyA^Ty with any xx in the nullspace. These vectors are perpendicular:
xT(ATy)=(Ax)Ty=0Ty=0\mathbf x^T(A^Ty) = (A\mathbf x)^Ty = 0^Ty = 0

  1. Every vector yy in the nullspace of ATA^T is perpendicular to every column of A.
    The left nullspace N(AT)N(A^T) and the column space C(A)C(A) are orthogonal in RmR^m.

For a visual proof, look at ATy=0A^Ty = 0. Each column of AA multiplies yy to give 0:
C(A)N(AT)C(A) \perp N(A^T) which is ATy=[(column 1)T(column n)T][y]=[0.0]A^Ty = \begin{bmatrix}\text{(column 1)}^T \\ \cdots \\ \text{(column n)}^T\end{bmatrix} \begin{bmatrix}y\end{bmatrix} = \begin{bmatrix}0 \\ .\\0\end{bmatrix}
The dot product of yy with every column of AA is zero. Then yy in the left nullspace is perpendicular to each column of AA—and to the whole column space.

  1. The orthogonal complement of a subspace VV contains every vector that is perpendicular to VV. This orthogonal subspace is denoted by VV^{\perp}(pronounced"VV perp").
    By this definition, the nullspace is the orthogonal complement of the row space.

Every xx that is perpendicular to the rows satisfies Ax=0A\mathbf x = 0, and lies in the nullspace.

The reverse is also true. If vv is orthogonal to the nullspace, it must be in the row space.
Otherwise we could add this vv as an extra row of the matrix, without changing its nullspace. The row space would grow, which breaks the law r+(nr)=nr + (n - r) = n. We conclude that the nullspace complement N(A)N(A)^{\perp} is exactly the row space C(AT)C(A^T).

In the same way, the left nullspace and column space are orthogonal in RmR^m, and they are orthogonal complements. Their dimensions rr and mrm - r add to the full dimension mm.

  1. N(A)N(A) is the orthogonal complement ofthe row space C(AT)C(A^T) (in RnR^n).
  2. N(AT)N(A^T) is the orthogonal complement of the column space C(A)C(A) (in RmR^m ).

vector space complements

The point of **“complements” **is that every x\mathbf x can be split into a row space component r\mathbf r and a nullspace component xn\mathbf x_n . When A multiplies x=xr+xn\mathbf x = \mathbf x_r + \mathbf x_n , Figure below shows what happens to Ax=Axr+AxnA\mathbf x= A\mathbf {x_r} + A\mathbf {x_n} :

The nullspace component goes to zero: Axn=0A\mathbf {x_n} =0.
The row space component goes to the column space: Axr=AxA\mathbf {x_r} = A\mathbf x.

Every vector goes to the column space!
Multiplying by AA cannot do anything else. More than that: Every vector bb in the column space comes from one and only one vector xr\mathbf x_r in the row space.

Proof: If Axr=AxrA\mathbf {x_r} = A\mathbf x_r', the difference xrxr\mathbf x_r - \mathbf x_r' is in the nullspace. It is also in the row space, where xr\mathbf x_r and xr\mathbf x_r' came from. This difference must be the zero vector, because the nullspace and row space are perpendicular. Therefore xr=xr\mathbf x_r = \mathbf x_r'

  1. repeat one clear fact. A row of AA can’t be in the nullspace of A (except for a zero row). The only vector in two orthogonal subspaces is the zero vector.

Projections

  1. The projection of a vector bb onto the line through aa is the closest point p=aaTbaTap=a\frac{a^Tb}{a^Ta}.
    The projection matrix P=aaTaTaP = \frac{aa^T}{a^Ta}

p=λa=aλ=aaTbaTa=aaTaTab=Pbp = \lambda a = a \lambda = a\frac{a^Tb}{a^Ta} = \frac{aa^T}{a^Ta}b= Pb, then the P is the projection matrix that handle the transform, p is the projection result vector.

The error e=bpe = b-p is perpendicular to aa : Right triangle b,p,eb,p,e has rulle p+e=b||p|| + ||e|| = ||b||.
2. The projection of bb onto a subspace SS is the closest vector pp in SS; bpb - p is orthogonal to SS.

Assume n linearly independent. vectors a1,,ana_1, \cdots , a_n in RmR^m.
Find the combination p=x^1a1++x^nanp = \hat x_1 a_1 + \cdots + \hat x_n a_n closest to aa given vector bb.

We compute projections onto $n-dimensional subspaces in 3 steps as before:

  1. find the vector x\mathbf x
  2. find the projection p=Axp = A\mathbf x
  3. find the projection matrix PP

The key is bb to the nearest point Ax^A\hat{\mathbf x} in the subspace. This error vector bAx^b - A\hat{\mathbf x} is perpendicular to the subspace. The error bAx^b - A\hat{\mathbf x} makes a right angle with all the vectors a1,,ana_1,\cdots,a_n in the base. The nn right angles give the nn equations for x^\hat{\mathbf x}:
[a1TanT][bAx^]=[0]\begin{bmatrix}-a_1^T-\\ \vdots \\ -a_n^T-\end{bmatrix}\begin{bmatrix}b - A\hat{\mathbf x}\end{bmatrix} = \begin{bmatrix}0\end{bmatrix}
The matrix with those rows aiTa_i^T is ATA^T. The nn equations are exactly AT(bAx^)=0A^T(b - A\hat{\mathbf x}) = 0.
Rewrite AT(bAx^)=0A^T(b - A\hat{\mathbf x}) = 0 in tis famous form ATAx^=ATbA^TA\hat{\mathbf x} = A^Tb

The combination p=x^1a1++x^nan=Axp=\hat x_1a_1 + \cdots + \hat x_n a_n =Ax that is closest to bb comes from x^\hat x:
Find x^(n×1)AT(bAx^)=0\hat x(n\times 1) \to A^T(b - A\hat{\mathbf x}) = 0 Or ATAx^=ATbA^TA\hat{\mathbf x} = A^Tb
This symmetric matrix ATAA^TA is n by n. It is invertible if the a’s are independent.
The solution is x^=(ATA)1ATb\hat x = (A^TA)^{-1}A^Tb. The projection of bb onto the subspace is pp.
Find p(m×1)p=Ax^=(ATA)1ATbp(m\times 1) \to p = A\hat x = (A^TA)^{-1}A^Tb
The next formula picks out the projection matrix that is multiplying bb in above
Find P(m×m)P=A(ATA)1ATP(m\times m) \to P = A(A^TA)^{-1}A^T

Compare with projection onto a line, when AA has only one column : ATAA^TA is aTaa^Ta.
For n=1x^=aTbaTa,p=aaTbaTa,P=aaTaTan = 1\to \hat x = \frac{a^Tb}{a^Ta}, \hspace{1em} p = a\frac{a^Tb}{a^Ta}, \hspace{1em} P = \frac{aa^T}{a^Ta}

The key step was AT(bAx^)=0A^T (b - A\hat x) = 0. We can expressed it by geometry. Linear algebra gives this “normal equation” too, in a very quick and beautiful way :

  1. Our subspace is the column space of AA.
  2. The error vector bAx^b-A\hat x is perpendicular to that column space.
  3. Therefore bAx^b-A\hat x is in the nullspace of ATA^T, that means AT(bAx^)=0A^T(b - A\hat x) = 0

The left nullspace is important in projections. That nullspace of ATA^T contains the error vector e=bAx^e = b- A\hat x. The vector bb is being split into the projection pp and the error e=bpe = b - p. Projection produces a right triangle with sides p,e,bp, e, b.

  1. ATAA^TA is invertible if and only if AA has linearly independent columns.

Least Squares Approximations

  1. When Ax=bAx=b has no solution, multiply by ATA^T and solve ATAx=ATbA^TAx=A^Tb.

It often happens that Ax=bAx = b has no solution. The usual reason is: too many equations.
The matrix AA has more rows than columns. There are more equations than unknowns (m>nm>n). Then columns span a small part of m-dimensional space. Unless all measurements are perfect, bb is outside that column space of AA. Elimination reaches an impossible equation and stops. But we can’t stop just because measurements include noise!
WIth the projection, we can reduce the dimension difference.

So SolvingI ATAx=ATbA^TAx=A^Tb gives the projection p=Ax^p=A\hat x of bb onto the column space of AA.

  1. When Ax=bAx=b has no solution, xis the “least -squares solution”: bAx^2=minimum||b - A\hat x||^2 = minimum.

Setting partial derivatives of E=Axb2E = ||Ax - b||^2 to zero (Exi=0\frac{\partial E}{\partial x_i} = 0)also produces ATAx=ATbA^TAx =A^Tb.

  1. To fit points (t1,b1),,(tm,bm)(t_1,b_1),\cdots,(t_m,b_m)by a straightline ,AA has columns (1,,1)(1,\cdots,1) and (t1,,tm)(t_1,\cdots,t_m).

$Ax = b $ is C+Dt1=b1C+Dt2=b2C+Dtm=bm\begin{matrix} C + Dt_1 = b_1\\ C + Dt_2 = b_2 \\ \vdots\\ C + Dt_m = b_m\end{matrix} with A=[1t11t21tm]A = \begin{bmatrix}1 & t_1 \\ 1 & t_2 \\ \vdots & \vdots \\ 1 & t_m\end{bmatrix}.
The column space is so thin that almost certainly bb is outside of it. When bb happens to lie in the column space, the points happen to lie on a line. In that case b=pb = p. Then Ax=bAx = b is solvable and the errors are e=(0,,0)e = (0,\cdots, 0).

Turn the fitting points by a straight line to least squares and Solve ATAx^=ATbA^TA\hat x=A^Tb for x^=(C,D)\hat x=(C,D). The errors are ei=biCDtie_i=b_i-C-Dt_i

ATA=[11t1tm][1t11tm]=[mtititi2]A^TA = \begin{bmatrix}1 & \cdots & 1\\ t_1 & \cdots & t_m\end{bmatrix}\begin{bmatrix}1 & t_1 \\ \vdots & \vdots \\ 1 & t_m\end{bmatrix} = \begin{bmatrix}m & \sum t_i \\ \sum t_i & \sum t_i^2\end{bmatrix}
ATb=[11t1tm][b1bm]=[bitibi]A^Tb = \begin{bmatrix}1 & \cdots & 1\\ t_1 & \cdots & t_m\end{bmatrix}\begin{bmatrix}b_1 \\ \vdots \\b_m\end{bmatrix} = \begin{bmatrix}\sum b_i & \sum t_ib_i\end{bmatrix}
ATAx^=ATb[mtititi2][CD]=[bitibi]A^TA\hat x=A^Tb \to \begin{bmatrix}m & \sum t_i \\ \sum t_i & \sum t_i^2\end{bmatrix}\begin{bmatrix}C \\ D\end{bmatrix} = \begin{bmatrix}\sum b_i \\ \sum t_ib_i\end{bmatrix}

The best x^=(C,D)\hat x = (C, D) is (ATA)1ATb(A^TA)^{-1}A^Tb.

Orthonormal Bases and Gram-Schmidt

  1. The columns q1,,qnq_1,\cdots,q_n are orthonormal if qiTqj={0for ij1for i=jq_i^Tq_j = \begin{cases}0 & for\space i \ne j \\ 1 & for\space i = j\end{cases}. Then QTQ=1Q^TQ = 1

If QQ is also square then QT=Q1Q^T = Q^{-1}
The least squares solution to Qx=bQ\mathbf x = b is x^=QTb\hat{\mathbf x} = Q^Tb. Projection of bb is p=QQTb=Pbp = QQ^Tb = Pb

  1. With three independent vectors a,b,ca, b, c to construct three orthogonal vectors A,B,CA, B, C.
    let A=aA = a
    then B=bATbATAAB = b - \frac{A^Tb}{A^TA}A
    then C=cATcATAABTcBTBBC = c - \frac{A^Tc}{A^TA}A - \frac{B^Tc}{B^TB}B

The Factorization A=QRA = QR with q1=AA,q2=BB,q3=CCq_1 = \frac{A}{||A||}, q_2 = \frac{B}{||B||}, q_3 = \frac{C}{||C||}
[abc]=[q1q2q3][q1Taq1Tbq1Tcq2Tbq2Tcq3Tc]\begin{bmatrix}a & b & c\end{bmatrix} = \begin{bmatrix}q_1 & q_2 & q_3\end{bmatrix}\begin{bmatrix}q_1^Ta & q_1^Tb & q_1^Tc \\ & q_2^Tb & q_2^Tc\\ & & q_3^Tc\end{bmatrix}, which also is A=QRA = QR
Clearly, R=QTAR = Q^TA

  1. With A=QRATA=(QR)TQR=RTQTQR=RTRA = QR \to A^TA = (QR)^TQR = R^TQ^TQR = R^TR
    So The least squares equation as follow
    ATAx^=ATbRTRx^=RTQTbRx^=QTbA^TA\hat x = A^Tb \to R^TR\hat x = R^TQ^Tb \to R\hat x = Q^Tb
    x^=R1QTb\hat x = R^{-1}Q^Tb

Determinants

  1. The determinant of an n by n matrix can be found in three ways:
  1. Pivot formula.

Multiply the nn pivots (times 1 or -1)

  1. “Big” formula.

Add up n!n! terms (times 1 or -1)

  1. Cofactor formula.

Combine nn smaller determinants (times 1 or -1)

  1. The properties of the determinant
  1. The determinant of the n by n identity matrix is 1.
  2. The determinant changes sign when two rows are exchanged (sign reversal).
  3. The determinant is a linear function of each row separately(all other rows stay fixed)

If the first row is multiplied by tt, the determinant is multiplied by tt. If first rows are added, determinants are added. This rule only applies when the other rows do not change! Notice how e and d stay the same:
multiply row 1 by any number tt, det is multiplied by t[tatbcd]=t[abcd]t\begin{bmatrix}ta & tb \\ c& d\end{bmatrix} = t\begin{bmatrix}a & b \\ c & d\end{bmatrix}
add row 1 of AA to row 1 of AA', then determinants add [a+ab+bcd]=[abcd]+[abcd]\begin{bmatrix}a + a' & b + b' \\ c& d\end{bmatrix} = \begin{bmatrix}a & b \\ c & d\end{bmatrix} + \begin{bmatrix}a' & b' \\ c & d\end{bmatrix}

  1. If two rows of AA are equal, then det A=0det\space A = 0.
  2. Subtracting a multiple of one row from another row leaves det A unchanged.

The determinant is not changed by the usual elimination steps from AA to UU.
Thus det A=det Udet\space A = det\space U. If we can find determinants of triangular matrices UU, we can find determinants of all matrices AA. Every row exchange reverses the sign, so always det A=±det Udet\space A= \pm det\space U.

  1. A matrix with a row of zeros has det A=0det\space A = 0.
  2. If AA is triangular then det A=a11a22ann=product ofdiagonal entriesdet\space A = a_{11}a_{22}\cdots a_{nn} = \text{product ofdiagonal entries}.
  3. If AA is singular then det A=0det\space A=0. If AA is invertible then det A0det\space A\ne 0.
  4. det A=±det U=±(product of the pivots)det\space A= \pm det\space U = \pm(\text{product of the pivots})
  5. the determinant of ABAB is det Adet\space A times det Bdet\space B, that det(AB)=det(A)det(B) OR AB=ABdet(AB) = det(A) det(B) \text{ OR } |AB| = |A| |B|

(det A)(det A1)=det I=1(det\space A)(det \space A^{-1}) = det\space I = 1

  1. the transpose ATA^T has the same determinant as AA, that’s det A=det ATdet\space A= det \space A^T
  1. The Pivot Formula

When elimination leads to A=LUA = LU, the pivots d1,,dnd_1 , \cdots , d_n are on the diagonal of the upper triangular UU. If no row exchanges are involved, multiply those pivots to find the determinant:
det A=(det L)(det U)=(1)(d1d2dn)det\space A = (det\space L)(det\space U) = (1)(d_1d_2\cdots d_n)
(det P)(det A)=(det L)(det U)det A=±(d1d2dn).(det\space P)(det\space A) = (det\space L)(det\space U) \to det\space A = \pm(d_1d_2\cdots d_n).
Each pivot is a ratio of determinants, the kthk_{th} pivot is dk=d1d2dkd1d2dk1=det Akdet Ak1d_k = \frac{d_1d_2\cdots d_k}{d_1d_2\cdots d_{k-1}} = \frac{det\space A_k}{det\space A_{k-1}}

  1. The Big Formula for Determinants

The formula has n!n! terms. Its size grows fast because $n! = 1, 2, 6, 24, 120, \cdots $
The determinant of AA is the sum of these n! simple determinants times 1 or -1.
The simple determinants a1αa2βanωa_{1\alpha}a_{2\beta}\cdots a_{n\omega} choose one entry from every row and column.

det A=sum over all n! column permutationsP=(α,β,,ω)=(det P)a1αa2βanω=BigFormuladet\space A\\ = \text{sum over all n! column permutations} P = (\alpha, \beta, \cdots, \omega)\\ = \sum(det\space P)a_{1\alpha}a_{2\beta}\cdots a_{n\omega} \\ = Big Formula

  1. Determinant by Cofactors

The determinant is the dot product of any row ii of AA with its cofactors using other rows:
Cofactors Formula det A=ai1Ci1+ai2Ci2++ainCindet \space A = a_{i1}C_{i1} + a_{i2}C_{i2} + \cdots + a_{in}C_{in}
Each cofactor CijC_{ij} (order nln - l, without row ii and column jj) includes its correct sign:
Cofactor Cij=(1)i+jdet MijC_{ij} = (-1)^{i+j}det\space M_{ij}
the submatrix MijM_{ij} throws out row ii and column jj.

Cramer’s Rule, Inverses, and Volumes

  1. The Cramer’s Rule solves Ax=bAx = b

A neat idea gives the first component x1x_1 . Replacing the first column of II by xx gives a matrix with determinant x1x_1 . When you multiply it by A, thefirst column becomes AxAx which is bb. The other columns of B1B_1 are copied from AA:
Key Idea [A][x100x210x301]=[b1a12a13b2a22a23b3a32a33]=B1\begin{bmatrix}A\end{bmatrix} \begin{bmatrix}x_1 & 0 & 0 \\ x_2 & 1 & 0 \\ x_3 & 0 &1\end{bmatrix} = \begin{bmatrix}b_1 & a_{12} & a_{13} \\ b_2 & a_{22} & a_{23} \\ b_3 & a_{32} & a_{33}\end{bmatrix} = B_1
With Product Rule (det A)(det[x100x210x301])=det B1(det A)(x1)=det B1x1=det B1det A(det \space A)(det \begin{bmatrix}x_1 & 0 & 0 \\ x_2 & 1 & 0 \\ x_3 & 0 &1\end{bmatrix} ) = det\space B_1 \to (det \space A)(x_1) = det\space B_1 \to x_1 = \frac{det \space B_1}{det \space A}
Wihe same procedure x2=det B2det A,x3=det B3det Ax_2 = \frac{det \space B_2}{det \space A}, x_3 = \frac{det \space B_3}{det \space A}

If det Adet\space A is not zero, Ax=bAx = b is solved by determinants: x1=det B1det A,x2=det B2det A,xn=det Bndet Ax_1 = \frac{det \space B_1}{det \space A}, x_2 = \frac{det \space B_2}{det \space A}, x_n = \frac{det \space B_n}{det \space A}

The matrix BjB_j has the jthj_{th} column of AA replaced by the vector bb.

  1. A1A^{-1} involves the cofactors. When the right side is a column of the identity matrix II, as in AA1=IAA^{-1} = I,
    the determinant of each BjB_j in Cramer’s Rule is a cofactor of AA.

The i,ji,j entry of A1A^{-1} is the cofactor Cji(notCij)C_{ji} (notC_{ij}) divided by det Adet\space A.
Formula For A1(A1)ij=Cjidet AA1=CTdet A\displaystyle \text{Formula For } A^{-1} \to (A^{-1})_{ij} = \frac{C_{ji}}{det\space A} \to A^{-1} = \frac{C^T}{det\space A}

  1. Area of a Triangle with 3 points (x1,y1),(x2,y2),(x3,y3)(x_1, y_1), (x_2, y_2), (x_3, y_3)

Determinants are the best way to find area. The area of a triangle is half of a 3 by 3 determinant.
Striangle=12x1y11x2y21x3y31=12x1y1x2y2when (x3,y3)=(0,0)S_{triangle} = \frac{1}{2}\begin{vmatrix}x_1 & y_1 & 1\\x_2 & y_2 & 1\\x_3 & y_3 & 1\end{vmatrix} = \frac{1}{2} \begin{vmatrix}x_1 & y_1 \\ x_2 & y_2\end{vmatrix} \text {when } (x_3, y_3) = (0, 0)

  1. The Cross Product

The cross product of u=(u1,u2,u3)u= (u_1, u_2, u_3) and v=(v1,v2,v3)v= (v_1, v_2, v_3) is a vector
u×v=ijku1u2u3v1v2v3=(u2v3u3v2)i+(u3v1u1v3)j+(u1v2u2v1)ku \times v = \begin{vmatrix}\mathbf i & \mathbf j & \mathbf k\\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3\end{vmatrix} = (u_2v_3 - u_3v_2)\mathbf i + (u_3v_1 - u_1v_3)\mathbf j + (u_1v_2 - u_2v_1)\mathbf k
This vector u×vu \times v is perpendicular to uu and vv. The cross product v×uv \times u is (u×v)-(u \times v).
that’s u×vuu\times v \perp u and u×vvu\times v \perp v and u×v=u v sinθ||u\times v|| = ||u||\space||v||\space|sin\theta|. while uv=u v cosθ|u\cdot v| = ||u||\space||v||\space|cos\theta|

Eigenvalues and Eigenvectors

The first part was about Ax=bAx = b: balance and equilibrium and steady state.
Now the second part is about change. Time enters the picture-continuous time in a differential equation du/dt=Audu/dt = Au or time steps in a difference equation uk+i=Auku_{k+i} = Au_k. Those equations are NOT solved by elimination.

The key idea is to avoid all the complications presented by the matrix A.
Suppose the solution vector u(t)u(t) stays in the direction of a fixed vector xx. Then we only need to find the number (changing with time) that multiplies xx. A number is easier than a vector. We want “eigenvectors” xx that don’t change direction when you multiply by AA.

  1. vectors not change di­rection, when they are multiplied by AA as AxAx, that are the “eigenvectors”.

The basic equation is Ax=λxAx=\lambda x. The number λ\lambda is an eigenvalue of AA.

  1. The number λ\lambda is an eigenvalue of A if and only if AλIA - \lambda I is singular. det(AλI)=0\to det(A - \lambda I) = 0
  2. The sum of the entries along the main diagonal is called the **trace **of AA.
  1. The sum of the nn eigenvalues equals the sum of the n diagonal entries.
    λ1+λ2++λn=a11+a22++ann=Trace\lambda_1 + \lambda_2 + \cdots + \lambda_n = a_{11} + a_{22} + \cdots + a_{nn} = Trace i=1nλi=i=1naii=Trace\displaystyle \to \sum_{i=1}^n \lambda_i = \sum_{i = 1}^na_{ii} = Trace
  2. Theproduct ofthe n eigenvalues equals the determinant.
    λ1λ2λn=det A\lambda_1 \lambda_2 \cdots\lambda_n = det\space A i=1nλi=det A\displaystyle \to \prod_{i=1}^n\lambda_i = det\space A
  1. The matrix AA turns into a diagonal matrix Λ\varLambda when we use the eigenvectors properly.

Diagonalization: Suppose the nn by nn matrix AA has nn linearly independent eigenvectors x1,,xnx_1, \cdots, x_n , Put them into the columns of an eigenvector matrix XX. ThenX1AXX^{-1}AX is the eigenvalue matrix Λ\varLambda

AX=A[x1,,xn]=[λx1λnxn]=[x1xn][λxλn]=XΛAX = A\begin{bmatrix}x_1, \cdots, x_n\end{bmatrix} = [\lambda x_1 \cdots \lambda_nx_n] = \begin{bmatrix}x_1 \cdots x_n\end{bmatrix}\begin{bmatrix}\lambda_x & & \\ & \ddots & \\ & & \lambda_n\end{bmatrix} = X\varLambda
AX=XΛ\to AX = X\varLambda
X1AX=Λ=[λ1λn]\to X^{-1}AX = \varLambda = \begin{bmatrix}\lambda_1 & & \\ & \ddots & \\ & & \lambda_n\end{bmatrix}
A=XΛX1A2=XΛX1XΛX1=XΛ2X\to A = X\varLambda X^{-1} \to A^2 = X\varLambda X^{-1}X\varLambda X^{-1} = X\varLambda^2X
An=XΛnX\to A^n = X\varLambda^nX

The matrix XX has an inverse, because its columns (the eigenvectors of AA) were assumed to be linearly independent. Without nn independent eigenvectors, we can’t diagonalize.
AA and Λ\varLambda have the same eigenvalues λ1,,λn\lambda_1,\cdots,\lambda_n · The eigenvectors are different.

  1. Similar Matrices: Same Eigenvalues

Suppose the eigenvalue matrix Λ\varLambda is fixed. As we change the eigenvector matrix XX, we get a whole family of different matrices A=XΛX1A=X\varLambda X^{-1}—all with the same eigenvalues in Λ\varLambda. All those matrices A (with the same Λ\varLambda) are called similar.

This idea extends to matrices that can’t be diagonalized. Again we choose one constant matrix CC(not necessarily Λ\varLambda). And we look at the whole family of matrices A=BCB1A = BCB^{-1}, allowing all invertible matrices BB. Again those matrices AA and CC are called similar

All the matrices A=BCB1A = BCB^{-1} are “similar”. The all share the eigenvalues of CC

Proof: let Cx=λx(BCB1)(Bx)=BCx=Bλx=λ(Bx)ABx=λ(Bx)Cx = \lambda x \to (BCB^{-1})(Bx) = BCx = B\lambda x = \lambda (Bx) \to ABx = \lambda(Bx)
So with any BB that invertible, the correspoding AA family and the fixed CC has the same eigenvalues for diffrent vectors.

  1. Solve Fibonacci Numbers

As define "“Every new Fibonacci number is the sum of the two previous F’s”, find F100F_{100} Fk+2={0k=01k=1Fk+1+Fkk2F_{k+2} = \begin{cases} 0 & k = 0\\ 1 & k = 1 \\ F_{k+1} + F_k & k \ge 2\end{cases}

The key is to begin with a matrix equation uk+1=Auku_{k+1} = Au_k. That is a one-step rule for vectors, while Fibonacci gave a two-step rule for scalars. We match those rules by putting two Fibonacci numbers into a vector. Then you will see the matrix AA.
Let uk=[Fk+1Fk]uk+1=[1110]uku_k = \begin{bmatrix}F_{k+1} \\ F_k\end{bmatrix} \to u_{k+1} = \begin{bmatrix}1 & 1\\ 1& 0\end{bmatrix}u_k
Every step multiplies by A=[1110]A = \begin{bmatrix} 1 & 1 \\ 1 & 0\end{bmatrix}, After 100 steps we reach u100=A100u0u_{100} = A^{100}u_0
This problem is just right for eigenvalues. Subtract λ\lambda from the diagonal of AA
AλI=[1λ11λ]det(AλI)=λ2λ1=0A - \lambda I = \begin{bmatrix}1-\lambda & 1 \\ 1 & -\lambda \end{bmatrix} \to det(A-\lambda I) = \lambda^2 - \lambda - 1 = 0
λ1,λ2\lambda_1,\lambda_2 are the roots of the quadratic equation, that’s the eigenvalues
So the eigenvectors x1=[λ1,1]T,x2=[λ2,1]Tx_1 = [\lambda_1, 1]^T, x_2 = [\lambda_2, 1]^T
finds the combination of those eigenvectors that gives u0=[1,0]Tu_0 =[1,0]^T
[10]=1λ1λ2([λ11][λ21])u0=x1x2λ1λ2\displaystyle \begin{bmatrix}1\\0\end{bmatrix} = \frac{1}{\lambda_1 - \lambda_2}\bigg(\begin{bmatrix}\lambda_1 \\ 1\end{bmatrix} - \begin{bmatrix}\lambda_2 \\ 1\end{bmatrix} \bigg) \to u_0 = \frac{x_1 - x_2}{\lambda_1 - \lambda_2}
multiplies u0u_0 by A100A_{100} to find u100u_{100} . The eigenvectors x1x_1 and x2x_2 stay separate!

u1=Au0=Ax1x2λ1λ2=1λ1λ2(Ax1Ax2)=1λ1λ2(λ1x1λ2x2)\displaystyle u_1 = Au_0 = A\frac{x_1 - x_2}{\lambda_1 - \lambda_2} = \frac{1}{\lambda_1 - \lambda_2}(Ax_1 - Ax_2) = \frac{1}{\lambda_1 - \lambda_2}(\lambda_1x_1 - \lambda_2x_2)
u2=Au1=Aλ1x1λ2x2λ1λ2=1λ1λ2(λ1Ax1λ2Ax2)=1λ1λ2(λ12x1λ22x2)\displaystyle u_2 = Au_1 = A\frac{\lambda_1x_1 - \lambda_2x_2}{\lambda_1 - \lambda_2} = \frac{1}{\lambda_1 - \lambda_2}(\lambda_1Ax_1 - \lambda_2Ax_2) = \frac{1}{\lambda_1 - \lambda_2}(\lambda_1^2x_1 - \lambda_2^2x_2)
un=λ1nx1λ2nx2λ1λ2u100=(λ1)100x1(λ2)100x2λ1λ2\vdots \\ \displaystyle u_n = \frac{\lambda_1^nx_1 - \lambda_2^nx_2}{\lambda_1 - \lambda_2} \to u_{100} = \frac{(\lambda_1)^{100}x_1 - (\lambda_2)^{100}x_2}{\lambda_1 - \lambda_2}

Conclusion: Solve uk+1=Auku_{k+1}= Au_k by uk=Aku0=XΛkX1u0=c1(λ1)kx1++cn(λn)kxnu_k = A^ku_0 = X\varLambda^kX^{-1}u_0 = c_1(\lambda_1)^kx_1 + \cdots + c_n(\lambda_n)^kx_n

Fibonacci’s example is a typical difference equation uk+1=Auku_{k+1} = Au_k. Each step multiplies by AA. The solution is uk=Aku0u_k = A^ku_0. We want to make clear how diagonalizing the matrix gives a quick way to compute Ak and find Uk in three steps.

The eigenvector matrix XX produces A=XΛX1A= X\varLambda X^{-1}. This is a factorization of the matrix, like A=LUA= LU or A=QRA= QR. The new factorization is perfectly suited to computing powers, because every time X1X^{-1} multiplies XX we get II.

**Powers of AA: ** Aku0=(XΛX1)(XΛX1)u0=XΛkX1u0\to A^ku_0 = (X\varLambda X^{-1})\cdots(X\varLambda X^{-1})u_0 = X\varLambda^k X^{-1}u_0
split XΛkX1u0X\varLambda^k X^{-1}u_0 into three steps that show how eigenvalues work:

  1. Write u0u_0 as a combination c1x1++cnxnc_1 x_1 + · · · + c_nx_n ofthe eigenvectors. Then c=X1u0c = X^{-1}u_0

u0=[x1xn][c1cn]{u0=Xcc=X1u0u_0 = \begin{bmatrix}x_1 & \cdots & x_n\end{bmatrix}\begin{bmatrix}c_1 \\ \vdots \\ c_n\end{bmatrix} \to \begin{cases}u_0 = Xc \\ c = X^{-1}u_0\end{cases}

  1. Multiply each eigenvector xix_i by (λi)k(\lambda_i)^k. Now we have ΛkX1u0\varLambda^kX^{-1}u_0.
  2. Add up the pieces ci(λi)kxic_i(\lambda_i)^kx_i to find the solution uk=Aku0u_k= A^ku_0. This is XΛkX1u0X\varLambda^kX^{-1}u_0

Aku0=XΛkX1u0=XΛkc=[x1xn][λ1kλnk][c1cn]A^ku_0 = X\varLambda^kX^{-1}u_0 = X\varLambda^kc = \begin{bmatrix}x_1 & \cdots & x_n\end{bmatrix}\begin{bmatrix}\lambda_1^k & & \\ & \ddots & \\ & & \lambda_n^k\end{bmatrix}\begin{bmatrix}c_1 \\ \vdots \\ c_n\end{bmatrix}
uk=c1(λ1)kx1++cn(λn)kxnu_k = c_1(\lambda_1)^kx_1 + \cdots + c_n(\lambda_n)^kx_n

Behind these numerical examples lies a fundamental idea: Follow the eigenvectors.

This is the crucial link from linear algebra to differential equations (λk\lambda^k will become eλte^{\lambda t}). Later we’ll see the same idea as “transforming to an eigenvector basis.” The best example of all is a Fourier series, built from the eigenvectors eikxe^{ikx} of d/dxd/dx.

  1. Systems of Differential Equations

The whole point of the section is To convert constant-coefficient differential equations into linear algebra
System of nn equations dudt=Au\displaystyle \frac{du}{dt} = Au starting from the vector u(0)=[u1(0)un(0)]u(0) = \begin{bmatrix}u_1(0) \\ \cdots \\ u_n(0)\end{bmatrix} at t=0t = 0

If Ax=λxAx = \lambda x then u(t)=eλtxu(t) = e^{\lambda t}x will solve dudt=Au\displaystyle \frac{du}{dt} = Au. Each λ\lambda and xx give a solution eλtxe^{\lambda t}x
If A=XΛX1A = X\varLambda X^{-1}, then u(t)=eAtu(0)=XeΛtX1u(0)=c1eλ1tx1++cneλntxnu(t) = e^{At}u(0) = Xe^{\varLambda t}X^{-1}u(0) = c_1e^{\lambda_1t}x_1 + \cdots + c_ne^{\lambda_nt}x_n

  1. Symmetric Matrices

It is no exaggeration to say that symmetric matrices SS are the most important matrices.

  1. A symmetric matrix has only real eigenvalues.
  2. The eigenvectors can be chosen orthonormal.

Those nn orthonormal eigenvectors go into the columns of XX. Every symmetric matrix can be diagonalized. Its eigenvector matrix XX becomes an orthogonal matrix QQ. Orthogonal matrices have Q1=QTQ^{-1} = Q^T — what we suspected about the eigenvector matrix is true. To remember it we write QQ instead of XX, when we choose orthonormal eigenvectors.

Every symmetric matrix has the factorization S=QΛQTS= Q\varLambda Q^T with real eigenvalues in Λ\varLambda and orthonormal eigenvectors in the columns of QQ.
Symmetric diagonalization S=QΛQ1=QΛQTS = Q\varLambda Q^{-1} = Q\varLambda Q^T
This says that every 2 by 2 symmetric matrix is (rotation)(stretch)(rotate back)
S=QΛQT=[q1q2][λ1λ2][q1Tq2T]=λ1q1q1T+λ2q2q2TS = Q\varLambda Q^T = \begin{bmatrix}q_1 & q_2\end{bmatrix} \begin{bmatrix}\lambda_1 & \\ & \lambda_2\end{bmatrix} \begin{bmatrix}q_1^T \\ q_2^T\end{bmatrix} = \lambda_1q_1q_1^T + \lambda_2q_2q_2^T
For every symmetric matrix S=QΛQT=λ1q1q1T++λnqnqnTS = Q\varLambda Q^T = \lambda_1q_1q_1^T +\cdots+ \lambda_nq_nq_n^T

Complex Eigenvalues of Real Matrices

For a symmetric matrix, lambdalambda and x turn out to be real. Those two equations become the same. But a nonsymmetric matrix can easily produce,\ and x that are complex.

For real matrices, complex λ\lambda''s and xx's come in “conjugate pairs.”
{λ=a+ibλˉ=aib\begin{cases}\lambda = a + ib \\ \bar\lambda = a - ib\end{cases} If Ax=λxAxˉ=λˉxˉAx = \lambda x \to A\bar x = \bar \lambda \bar x

Eigenvalues versus Pivots

The eigenvalues of AA are very different from the pivots.

  • For eigenvalues, we solve det(AAI)=0det(A - AI) = 0.
  • For pivots, we use elimination.

The only connection so far is product of pivots = determinant = product of eigenvalues.
We are assuming a full set of pivots d1,,dnd_1, \cdots, d_n. There are nn real eigenvalues λ1,,λn\lambda_1, \cdots , \lambda_n. The dd's and λ\lambda's are not the same, but they come from the same symmetric matrix. The dd's and λ\lambda's have a hidden relation. For symmetric matrices the pivots and the eigenvalues have the same signs

The number of positive eigenvalues of S=STS = S^T equals the number of positive pivots.
Special case: SS has all Ai>0A_i > 0 if and only if all pivots are positive.

Positive Definite Matrices

This section concentrates on symmetric matrices that have positive eigenvalues.

Symmetric S:all eigenvalues>0    all pivots>0    all upper left determinants>0Symmetric\space S: \\\text{all eigenvalues} \gt 0 \iff \text{all pivots} \gt 0 \iff \text{all upper left determinants}\gt 0

@(Energy-based Definition)

S is positive definite if xTSx>0 for every nonzero vector xS \text{ is positive definite if } x^TSx \gt 0 \text{ for every nonzero vector } x.

Sx=λxxTSx=λxTxSx = \lambda x \to x^TSx = \lambda x^Tx. The right side is a positive λ\lambda times a positive number xTx=x2x^Tx = ||x||^2 . So the left side xTSxx^TSx is positive for any eigenvector.

If SS and TT are symmetric positive definite, so is S+TS + T.

If the columns of AA are independent, then S=ATAS = A^TA is positive definite.

xTSx=xTATAx=(Ax)T(Ax)=Ax20x^TSx = x^TA^TAx = (Ax)^T(Ax) = ||Ax||^2 \ge 0

@(Positive Semidefinite Matrices)

Often we are at the edge of positive definiteness. The determinant is zero. The smallest eigenvalue is zero. These matrices on the edge are called positive semidefinite. Here are two examples (not invertible): S=[1224]S = \begin{bmatrix}1 & 2\\2 & 4\end{bmatrix} and T=[211121112]T = \begin{bmatrix}2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2\end{bmatrix}

Positive semidefinite matrices have all $\lambda \ge 0 $ and all xTSx0x^TSx \ge 0. Those weak inequalities ( \ge instead of >\gt ) include positive definite SS and also the singular matrices at the edge.

@(Application)[The Ellipse]

  1. The tilted ellipse is associated with SS. Its equation is xTSx=1x^TSx = 1.
  2. The lined-up ellipse is associated with Λ\varLambda. Its equation is XTΛX=1X^T\varLambda X = 1.
  3. The rotation matrix that lines up the ellipse is the eigenvector matrix QQ.

The tilted ellipse ax2+2bxy+cy2=1ax^2 + 2bxy + cy^2 = 1
[xy][abbc][xy]=1xTSx=1S=[abbc]\to \begin{bmatrix}x & y\end{bmatrix}\begin{bmatrix}a & b \\ b & c\end{bmatrix}\begin{bmatrix}x \\ y\end{bmatrix} = 1 \\ \to x^TSx = 1 \to S = \begin{bmatrix}a & b \\ b & c\end{bmatrix}
Example: Find the axes of this tilted ellipse 5x2+8xy+5y2=15x^2 + 8xy + 5y^2 = 1.
S=[5445]λ1=1,λ2=9eigenvector [11],[11][5445]=12[1111][9001]12[1111]=QΛQT\to S = \begin{bmatrix}5 & 4 \\ 4 & 5\end{bmatrix}\to \lambda_1 = 1, \lambda_2 = 9\to eigenvector\space \begin{bmatrix}1\\1\end{bmatrix}, \begin{bmatrix}1\\-1\end{bmatrix}\\\to\begin{bmatrix}5 & 4 \\ 4& 5\end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\1 & -1\end{bmatrix}\begin{bmatrix}9 & 0\\0 &1\end{bmatrix}\frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\1 & -1\end{bmatrix} = Q\varLambda Q^T

Now multiply by [x y][x\space y] on the left and[x y]T[x\space y]^Ton the right to get xTSx=(xTQ)Λ(QTx)x^TSx=(x^TQ)\varLambda(Q^Tx)
xTSx=sum of squares5x2+8xy+5y2=9(x+y2)2+1(xy2)2x^TSx = \text{sum of squares} \to 5x^2 + 8xy + 5y^2 = 9(\frac{x+y}{\sqrt{2}})^2 + 1(\frac{x - y}{\sqrt{2}})^2

The coefficients are the eigenvalues 9 and 1 from Λ\varLambda. Inside the squares are the eigenvectors q1=(1,1)/2q_1 =(1,1)/\sqrt{2} and q2=(1,1)/2q_2 =(1,-1)/\sqrt{2}.
The axes of the tilted ellipse point along those eigenvectors. This explains why S=QΛQTS = Q\varLambda Q^T is called the “principal axis theorem”—it displays the axes.

S=QΛQTS= Q\varLambda Q^T is positive definite when all λi>0\lambda_i > 0. The graph of xTSx=1x^TSx = 1 is an ellipse
Ellipse [xy]QΛQT[xy]=[XY]Λ[XY]=λ1X2+λ2Y2=1\text{Ellipse } \begin{bmatrix}x & y\end{bmatrix}Q\varLambda Q^T\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}X&Y\end{bmatrix}\varLambda\begin{bmatrix}X\\Y\end{bmatrix} = \lambda_1X^2 + \lambda_2Y^2 = 1
The axes point along eigenvectors of SS. The half-lengths are 1λ1\frac{1}{\sqrt{\lambda_1}} and 1λ2\frac{1}{\sqrt{\lambda_2}}

elipse_tilted_to_line_up

Singular Value Decomposition (SVD)

Image Processing by Linear Algebra

  1. The singular value theorem for AA is the eigenvalue theorem for ATAA^TA and AATAA^T.

A has two sets of singular vectors (the eigenvectors of ATAA^TA and AATAA^T). There is one set of positive singular values (because ATAA^TA has the same positive eigenvalues as AATAA^T). AA is often rectangular, but ATAA^TA and AATAA^T are square, symmetric, and positive semidefinite.

  1. The Singular ValueDecomposition (SVD) separates any matrix into simple pieces.

Each piece is a column vector times a row vector.

  1. Use the eigenvectors uu of AATAA^T and the eigenvectors vv of ATAA^TA.

Since AATAA^T and ATAA^TA are automatically symmetric (but not usually equal!) the uu's will be one orthogonal set and the eigenvectors vv will be another orthogonal set. We can and will make them all unit vectors: ui=1||u_i|| = 1 and vi=1||v_i|| = 1. Then our rank 2 matrix will be A=σ1u1v1T+σ2u2v2TA = \sigma_1u_1v_1^T + \sigma_2u_2v_2^T The size of those numbers σ1\sigma_1 and σ2\sigma_2 will decide whether they can be ignored in compression. We keep larger σ\sigma's, we discard small σ\sigma's.
The uu's from the SVD are called left singular vectors (unit eigenvectors of AATAA^T ).
The vv's from the SVD are called right singular vectors (unit eigenvectors of ATAA^TA ).
The σ\sigma's are singular values, square roots(λ)(\sqrt \lambda) of the equal eigenvalues of AATAA^T and ATAA^TA
Choices from the SVD: AATui=σi2uiATAvi=σi2viAvi=σiui\text{Choices from the SVD: }\hspace{2em} AA^Tu_i = \sigma_i^2u_i \hspace{1em} A^TAv_i = \sigma_i^2v_i \hspace{1em} Av_i = \sigma_iu_i

Bases and Matrices in the SVD

A is any mm by nn matrix, square or rectangular. Its rank is rr. We will diagonalize this AA, but not by X1AXX^{-1}AX. The eigenvectors in XX have three big problems:

  • They are usually not orthogonal,
  • there are not always enough eigenvectors,
  • and Ax=λxAx = \lambda x requires AA to be a square matrix.

The singular vectors of AA solve all those problems in a perfect way.

  1. The SVD produces orthonormal basis of vv's and uu's for the four fundamental subspaces.

There are two sets of singular vectors, uu's and vv's. The uu's are in RmR^m and the vv's are in RnR^n. They will be the columns of an mm by mm matrix UU and an nn by nn matrix VV
The uu's and vv's give bases for the four fundamental subspaces:

u1,,uru_1,\cdots,u_r is an orthonormal basis for the column space
ur+1,,umu_{r+1},\cdots,u_m is an orthonormal basis for the left nullspace N(AT)N(A^T)
v1,,vrv_1,\cdots,v_r is an orthonormal basis for the row space
vr+1,,vnv_{r+1},\cdots,v_n is an orthonormal basis for the nullspace N(A)N(A).

More than just orthogonality, these basis vectors diagonalize the matrix A:

A is diagonalizedAv1=σ1u1Av2=σ2u2Avr=σrur\text{A is diagonalized} \hspace{1em} Av_1 = \sigma_1u_1 \hspace{1em} Av_2 = \sigma_2u_2 \hspace{1em}\cdots\hspace{1em} Av_r = \sigma_ru_r

Those singular values σ1\sigma_1 to σr\sigma_r will be positive numbers: σi\sigma_i is the length of AviAv_i. The σ\sigma's go into a diagonal matrix that is otherwise zero. That matrix is \sum.

Since the uu's are orthonormal, the matrix UrU_r with those rr columns has UrTUr=IU_r^TU_r = I. Since the vv's are orthonormal, the matrix VrV_r has VrTVr=IV_r^TV_r = I. Then the equations Avi=σiuiAv_i=\sigma_iu_i tell us column by column that AVr=UrrAV_r =U_r\sum_r

AVr=UrrA[v1vr]=[u1ur][σ1σr]AV_r = U_r\sum_r \to A\begin{bmatrix}v_1 & \cdots & v_r\end{bmatrix} = \begin{bmatrix}u_1 & \cdots & u_r\end{bmatrix} \begin{bmatrix}\sigma_1 & & \\ & \ddots & \\ & & \sigma_r\end{bmatrix}

This is the heart of the SVD, but there is more. Those vv's and uu's account for the row space and column space of AA. We have nrn - r more vv's and mrm - r more uu's, from the nullspace N(A)N(A) and the left nullspace N(AT)N(A^T). They are automatically orthogonal to the first vv's and uu's (because the whole nullspaces are orthogonal). We now include all the vv's and uu's in VV and UU, so these matrices become square. We still have AV=UAV = U\sum.

AV=UA[v1vrvn]=[u1urum][σ1σr]AV = U\sum \to A\begin{bmatrix}v_1 & \cdots & v_r & \cdots & v_n\end{bmatrix} = \begin{bmatrix}u_1 & \cdots & u_r & \cdots & u_m\end{bmatrix} \begin{bmatrix}\sigma_1 & & & \\ & \ddots & & \\ & & \sigma_r & \\ & & & \end{bmatrix}

The new \sum is mm by nn. It is just the rr by rr matrix inequation above with mrm-r extra zero rows and nrn - r new zero columns. The real change is in the shapes of UU and VV. Those are square matrices and V1=VTV^{-1}= V^T . So AV=UAV= U\sum becomes A=UVA= U\sum V . This is the Singular Value Decomposition.

SVDA=UVT=u1σ1v1T++urσrvrT\text{SVD}\hspace{1em} A = U\sum V^T = u_1\sigma_1v_1^T + \cdots + u_r\sigma_rv_r^T

It’s is crucial that each σi2\sigma_i^2 is an eigenvalue of ATAA^TA and also AATAA^T. When we put the singular values in descending order, σ1σ2σr>0\sigma_1 \ge \sigma_2\ge \cdots \sigma_ r > 0, the splitting in equation above gives the rr rank-one pieces of AA in order of importance.

Example, Find the matrices U,,VU, \sum, V for A=[3045]A = \begin{bmatrix}3 & 0 \\ 4 & 5\end{bmatrix}

ATA=[25202025],AAT=[9121241]λ1=45,λ2=5σ12=λ1=45,σ22=λ2=5eigenvectors for ATA,v1=12[11],v2=12[11]ui=Aviσiu1=110[13],u2=110[31]U=110[1331],=[455],V=12[1111]AV=UA=UV1A=σ1u1v1T+σ2u2v2T\to A^TA = \begin{bmatrix}25 & 20 \\ 20 & 25\end{bmatrix}, AA^T = \begin{bmatrix}9 & 12 \\ 12 & 41\end{bmatrix}\to \lambda_1 = 45, \lambda_2 = 5 \\ \to \sigma_1^2 = \lambda_1 = 45, \sigma_2^2 = \lambda_2 = 5 \\ \to \text{eigenvectors for }A^TA, v_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1\end{bmatrix}, v_2 = \frac{1}{\sqrt{2}}\begin{bmatrix} -1 \\ 1\end{bmatrix} \\ \to u_i = \frac{Av_i}{\sigma_i}\to u_1 = \frac{1}{\sqrt{10}}\begin{bmatrix} 1 \\ 3\end{bmatrix}, u_2 = \frac{1}{\sqrt{10}}\begin{bmatrix} -3 \\ 1\end{bmatrix} \\ \to U = \frac{1}{\sqrt{10}}\begin{bmatrix} 1 & -3 \\ 3&1\end{bmatrix}, \sum = \begin{bmatrix} \sqrt{45} & \\ & \sqrt{5}\end{bmatrix}, V = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & -1 \\ 1&1\end{bmatrix}\\ \to AV = U\sum \to A = U\sum V^{-1} \to A = \sigma_1u_1v_1^T + \sigma_2u_2v_2^T

Singular Vectors of AA and Eigenvectors of S=ATAS = A^TA

Symmetric SS=QΛQT=λ1q1q1T+λ2q2q2T++λrqrqrTAnymatrix AA=UVT=σ1u1v1T+σ2u2v2T++σrurvrT\text{Symmetric S}\hspace{1em} S = Q\varLambda Q^T = \lambda_1q_1q_1^T + \lambda_2q_2q_2^T + \cdots + \lambda_rq_rq_r^T \\ \text{Anymatrix A}\hspace{1em} A = U\sum V^T = \sigma_1u_1v_1^T + \sigma_2u_2v_2^T + \cdots + \sigma_ru_rv_r^T

Principal Component Analysis (PCA by the SVD)

This section explains a major application of the SVD to statistics and data analysis.
PCA gives a way to understand a data plot in dimension m=m = the number of measured variables (here age and height). Subtract average age and height (m=2m = 2 for nn samples) to center the mm by nn data matrix A. The crucial connection to linear algebra is in the singular values and singular vectors of AA. Those come from the eigenvalues λ=σ2\lambda=\sigma^2 and the eigenvectors uu of the sample covariance matrix S=AAT/(n1)S = AA^T/(n - 1).

  1. The total variance in the data is the sum of all eigenvalues and of sample variances s2s^2 : Total variance T=σ12++σm2=s12++sm2=trace (diagonal sum)\text{Total variance T} = \sigma_1^2 + \cdots + \sigma_m^2= s_1^2 + \cdots + s_m^2 = \text{trace (diagonal sum)}
  2. The first eigenvector u1u_1 of SS points in the most significant direction of the data. That direction accounts for (or explains) a fraction σ12/T\sigma_1^2/T of the total variance.
  3. The next eigenvector u2u_2 (orthogonal to u1u_1) accounts for a smaller fraction σ22/T\sigma_2^2/T.
  4. Stop when those fractions are small. You have the RR directions that explain most of the data. The nn data points are very near an R-dimensional subspace with basis u1u_1 to URU_R — These uu's are the principal components in mdimensionalm-dimensional space.
  5. RR is the “effective rank” of AA. The true rank rr is probably mm or nn : full rank matrix.

matrix-analysis

Reference