Linear Algebra, Spring 2023

Notes will be updated as we go along. For course info e.g. exam (dates, time and location), grade distribution, etc, please check the course website .

Office hours : MWF 10:00 - 11:00 WH335 (or by appointment)

For my students in Math 304, this note is only complementary to your book, any materials on the book is fair game for the exam. This note is not a replacement for the book.

Linear Systems & Matrices

Matrices and Vectors

A matrix is a rectangular array of numbers arranged in rows and columns. $$A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}$$ If the matrix has $m$ rows and $n$ columns, then the size of the matrix is $m \times n$.

The set of all matrices with size $m \times n$ will be denoted by $\R^{m\times n}$.

A column vector or vector is a matrix with only one column. $$b = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{m} \end{bmatrix}$$

The set of all vectors with size $m$ will be denoted by $\R^m$.

Addition and Scalar Multiplication

Addition of two matrices of the same size is defined by adding the corresponding entries. Where the $i,j$-th entry of the sum is given by $$(A+B)_{ij} = A_{ij} + B_{ij}$$ Since vectors are matrices with only one column, the same definition applies to vectors.

Scalar Multiplication of a matrix by a scalar is defined by multiplying each entry by the scalar. $$(cA)_{ij} = cA_{ij}$$ The same definition applies to vectors.

Matrix Multiplication

Matrix Multiplication of a matrix $A$ with size $m \times n$ and a matrix $B$ with size $n \times p$ gives a matrix $C$ with size $m \times p$. $$ \underbrace{A}_{{\color{green}m}\times \color{red}{n}} \underbrace{B}_{{\color{red}n} \times {\color{green}p}} = \underbrace{C}_{ {\color{green}m} \times {\color{green}p}} $$ And the $i,j$-th entry of $C$ is given by $$C_{ij} = \sum_{k=1}^n A_{ik}B_{kj} = A_{i,:} \cdot B_{:,j}$$ Where $A_{i,:}$ is the $i$-th row of $A$ and $B_{:,j}$ is the $j$-th column of $B$.

$$\begin{bmatrix} 1 & 2 \\ \color{green}3 & \color{green} 4 \\ 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 2 & \color{green}3 \\ 4 & 5 & \color{green}6 \end{bmatrix} = \begin{bmatrix} 9 & 12 & 15 \\ 19 & 26 & \color{green}33 \\ 29 & 40 & 51 \end{bmatrix}$$ Here $$ \begin{align*} C_{2,3} = A_{2,:}\cdot B_{:,3} &= \begin{bmatrix}3 & 4\end{bmatrix} \cdot \begin{bmatrix}3 \\ 6\end{bmatrix}\\& = 3 \cdot 3 + 4 \cdot 6 = 33 \end{align*} $$

Linear Systems

Linear Equations

A linear equation is an equation of the form $$a_1x_1 + a_2x_2 + \cdots + a_nx_n = b$$ where $a_1, a_2, \cdots, a_n$ and $b$ are real numbers.

$x_1, x_2, \cdots, x_n$ are the variables of the equation, and $a_1, a_2, \cdots, a_n$ are the coefficients, and $b$ is the constant. In terms of dot product, a linear equation can be written as $$\vc{a} \cdot \vc{x} = b$$ where $\vc{a} = (a_1, a_2, \cdots, a_n)$ and $\vc{x} = (x_1, x_2, \cdots, x_n)$. A solution to a linear equation in $n$ variables is a vector $\vc{x} = (x_1, x_2, \cdots, x_n)$ such that the equation is satisfied. That is, $\vc{a}\cdot \vc{x} =b$. The solution (set), to a linear equation in $n$ variables is a $(n-1)$ dimensional hyperplane.

The solution set to a linear equation in $2$ variables, $ax+by=c$, is a line in $\R^2$.
The solution set to a linear equation in $3$ variables, $ax+by+cz = d$, is a plane in $\R^3$.

Linear Systems

Given a linear system of $m$ equations, the solution (set) to the system is geometrically the intersection of the $m$ solution sets to the $m$ equations.

Algebraically, the solution set is usually obtained through a process of row operations applying to the augmented matrix of the system. Row Operations are the following:

Interchange two rows.
Multiply a row by a nonzero scalar.
Add a multiple of one row to another row.

Two matrices are row equivalent if one can be obtained from the other by a sequence of row operations.

$$\underbrace{\begin{array}{rr} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n & = & b_1 \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n & = & b_2 \\ \vdots \vdots \vdots\\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n & = & b_m \end{array}}_{\text{$m$ equations in $n$ variables}} \quad \xrightarrow{\left[A\mid b\right]} \quad \underbrace{\left[\begin{array}{cccc|c} a_{11} & a_{12} & \cdots & a_{1n} & b_1 \\ a_{21} & a_{22} & \cdots & a_{2n} & b_2 \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} & b_m \end{array}\right]}_{\text{argumented matrix}} \quad \xrightarrow[\text{Forward Elimination}]{ \text{Gaussian}} \quad \underbrace{ \left[\begin{array}{cccc|c} * & * & \cdots & * & * \\ 0 & * & \cdots & * & * \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \cdots & * & * \end{array}\right] }_{\text{Row Echelon Form}} \quad \xrightarrow[\text{Forward Elimination}]{ \text{Gauss-Jordan}}\quad \underbrace{ \left[\begin{array}{cccc|c} 1 & * & \cdots & * & * \\ 0 & 1 & \cdots & * & * \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & * \end{array}\right] }_{\text{Reduced Row Echelon Form}} \quad \xrightarrow[\text{Back Elimination}]{ \text{Gauss-Jordan}} \quad \underbrace{ \left[\begin{array}{cccc|c} 1 & 0 & \cdots & 0 & * \\ 0 & 1 & \cdots & 0 & * \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & * \end{array}\right] }_{\text{Identity Matrix}} $$

When solving a system of linear equations, there are three possible outcomes:

There is an unique solution.
There are infinitely many solutions.
The system is inconsistent.

In the notation of matrix multiplication, a system of linear equations is essentially a matrix equation. $$\underbrace{\begin{array}{rr} a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n & = & b_1 \\ a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n & = & b_2 \\ \vdots \vdots \vdots\\ a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n & = & b_m \end{array}}_{\text{$m$ equations in $n$ variables}}\Longrightarrow Ax=b$$ where $A$ is the $m\times n$ matrix of coefficients, $x \in \R^n$ is the vector of variables and $b\in \R^m$ is the vector of constants. In matrix notation, it is clear that a homogeneous system of linear equations, i.e. a system of the form $Ax=\vc{0}$, always has a solution, the zero vector $\vc{0}$, since $A \vc{0} = \vc{0}$.

If $v$ is a solution to the homogeneous system $Ax=0$, and $w$ is a solution to the non-homogeneous system $Ax=b$, then $v+w$ is another solution to $Ax=b$ since $$A(v+w) = Av + Aw = 0+b=b$$ Given a matrix equation, it will be nice if solving the problem is as easy as $$Ax=b \implies x = A^{-1}b$$

Matrix Inversion

A square matrix $A$ is invertible if there exist a matrix $M$ such that $M A=I$.
And $M$ is called the inverse matrix of $A$, denoted by $A^{-1}$.

To find the inverse of a matrix, we can use the Gauss-Jordan Elimination method. $$ [M|I] \quad \xrightarrow[\text{ Elimination}]{ \text{Gauss-Jordan}}\quad [I|M^{-1}] $$ An elementary matrix is a matrix that can be obtained from the identity matrix by a single elementary row operation.

In each step, if we record the corresponding elementary matrix, we can express the inverse of the matrix as a product of elementary matrices. Consider finding the inverse of the matrix $$M= \begin{bmatrix} 1& 2 & 0 \\ 2 & 1 & 3 \\ 0 & 1 & 9 \end{bmatrix}$$ We use the Gauss-Jordan Elimination method to find the inverse of $M$, and record the elementary matrices. $$ \left[ \begin{array}{rrr|rrr} 1 & 2 & 0 & 1 & 0 & 0 \\ 2 & 1 & 3 & 0 & 1 & 0 \\ 0 & 1 & 9 & 0 & 0 & 1 \end{array}\right] \quad \xrightarrow[ E_1 = \begin{bmatrix} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} ]{-2R_1+R_2\to R_2} \left[ \begin{array}{rrr|rrr} 1 & 2 & 0 & 1 & 0 & 0 \\ 0 & -3 & 3 & -2 & 1 & 0 \\ 0 & 1 & 9 & 0 & 0 & 1 \end{array}\right] \quad \xrightarrow[ E_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -\frac{1}{3} & 0 \\ 0 & 0 & 1 \end{bmatrix} ]{-3R_2\to R_2} \left[ \begin{array}{rrr|rrr} 1 & 2 & 0 & 1 & 0 & 0 \\ 0 & 1 & -1 & \frac{2}{3} & \frac{-1}{3} & 0 \\ 0 & 1 & 9 & 0 & 0 & 1 \end{array}\right] \quad \xrightarrow[ E_3 = \begin{bmatrix} 1 & -2 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} ]{-2R_2+R_1\to R_1} \left[ \begin{array}{rrr|rrr} 1 & 0 & 2 & \frac{-1}{3} & \frac{2}{3} & 0 \\ 0 & 1 & -1 & \frac{2}{3} & \frac{-1}{3} & 0 \\ 0 & 1 & 9 & 0 & 0 & 1 \end{array}\right] \quad \xrightarrow[ E_4 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix} ] {-R_2+R_3\to R_3} \left[ \begin{array}{rrr|rrr} 1 & 0 & 2 & \frac{-1}{3} & \frac{2}{3} & 0 \\ 0 & 1 & -1 & \frac{2}{3} & \frac{-1}{3} & 0 \\ 0 & 0 & 10 & \frac{-2}{3} & \frac{1}{3} & 1 \end{array}\right] \quad \xrightarrow[ E_5 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & \frac{1}{10} \end{bmatrix} ] {\frac{1}{10}R_3\to R_3} \left[ \begin{array}{rrr|rrr} 1 & 0 & 2 & \frac{-1}{3} & \frac{2}{3} & 0 \\ 0 & 1 & -1 & \frac{2}{3} & \frac{-1}{3} & 0 \\ 0 & 0 & 1 & \frac{-1}{15} & \frac{1}{30} & \frac{1}{10} \end{array}\right] \quad \xrightarrow[ E_6 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix} ] {R_3+R_2\to R_2} \left[ \begin{array}{rrr|rrr} 1 & 0 & 2 & \frac{-1}{3} & \frac{2}{3} & 0 \\ 0 & 1 & 0 & \frac{3}{5} & \frac{-3}{10} & \frac{1}{10} \\ 0 & 0 & 1 & \frac{-1}{15} & \frac{1}{30} & \frac{1}{10} \end{array}\right] \quad \xrightarrow[ E_7 = \begin{bmatrix} 1 & 0 & -2 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} ] {-2R_3+R_1\to R_1} \left[ \begin{array}{rrr|rrr} 1 & 0 & 0 & \frac{-1}{5} & \frac{3}{5} & \frac{-1}{5} \\ 0 & 1 & 0 & \frac{3}{5} & \frac{-3}{10} & \frac{1}{10} \\ 0 & 0 & 1 & \frac{-1}{15} & \frac{1}{30} & \frac{1}{10} \end{array}\right] $$ Thus $$ M^{-1}= E_7E_6E_5E_4E_3E_2E_1 = \begin{bmatrix} \frac{-1}{5} & \frac{3}{5} & \frac{-1}{5} \\ \frac{3}{5} & \frac{-3}{10} & \frac{1}{10} \\ \frac{-1}{15} & \frac{1}{30} & \frac{1}{10} \end{bmatrix} $$

A matrix is invertible $$\iff$$ it is row equivalent to the identity matrix.
$$(AB)^{-1}=B^{-1}A^{-1}$$ $$(A^T)^{-1}=(A^{-1})^T$$
$$(E_{R_i\leftrightarrow R_j})^{-1}=E_{R_i\leftrightarrow R_j}$$ $$(E_{cR_i\rightarrow R_i})^{-1}=E_{\frac{1}{c}R_i\rightarrow R_i}$$ $$ (E_{cR_i+R_j\rightarrow R_j})^{-1} = E_{-cR_i+R_j\rightarrow R_j} $$

Schur's Complement

Let $A$ be a square matrix with size $n\times n$, and partition $A$ into four blocks $$A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}$$ where $A_{11}$ is a $k\times k$ matrix, The Schur's complement of $A_{11},A_{22}$ in $A$ are the matrices $$ A/A_{11}=A_{22}-A_{21}A_{11}^{-1}A_{12} $$ $$ A/A_{22}=A_{11}-A_{12}A_{22}^{-1}A_{21} $$

$$ A^{-1} = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}^{-1} = \begin{bmatrix} (A/A_{22})^{-1} & -(A/A_{22})^{-1}A_{12}A_{22}^{-1} \\ -(A/A_{11})^{-1}A_{21}A_{11} & (A/A_{11})^{-1} \end{bmatrix} $$ If $\Sigma=\begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix}$ is the covariance matrix of a random vector $\vc{x}=\begin{bmatrix} \vc{x}_1 \\ \vc{x}_2 \end{bmatrix}$, then the conditional covariance of $\vc{x}_1$ given $\vc{x}_2$ is $$ \Sigma_{1|2} = \Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} = \Sigma / \Sigma_{22} $$

Vector Spaces

Euclidean Spaces

Vector addition and scalar multiplication defined for $\R^n$, turns $\R^n$ into a vector space, in the sense that $(\R^n,+,\cdot)$ satisfies the following properties: For all $\mathbf{u},\mathbf{v},\mathbf{w}\in \R^n$ and $k,l\in \R$:

Commutativity$$\mathbf{u}+\mathbf{v}=\mathbf{v}+\mathbf{u}$$
Associativities$$(\mathbf{u}+\mathbf{v})+\mathbf{w}=\mathbf{u}+(\mathbf{v}+\mathbf{w})$$ $$ k(l\mathbf{v})=(k l)\mathbf{v}$$
Existence of identities $$\mathbf{v}+\mathbf{0}=\mathbf{v} $$ $$1\mathbf{v}=\mathbf{v} $$
Existence of additive inverse$$\exists \mathbf{-v}\in V:\quad \mathbf{v}\oplus(-\mathbf{v})=\mathbf{0}$$
Distributivity$$k(\mathbf{u}+\mathbf{v})=(k \mathbf{u})+( k \mathbf{v})$$ $$(k+l) \mathbf{v}=(k \mathbf{v})+( l \mathbf{v})$$

$\R^n$ is the blueprint for all vector spaces. The above properties will be taken as the definition for general vector spaces.

Vector Spaces

Now we abstract the idea of an Euclidean space to a more general concept of a vector space by taking properties of Euclidean spacones as the definition of vector spaces. A set $V$ together with an addition $$ \oplus: V\times V \to V $$ and a scalar multiplication $$ \otimes:\R \times V \to V $$ defined on $ V $ is a vector space if, for $\mathbf{u}, \mathbf{v},$ and $\mathbf{w}$ in $ V $, and $k$ and $l$ in $ \mathbb{R} $, the following rules hold:

Commutativity$$\mathbf{u}\oplus\mathbf{v}=\mathbf{v}\oplus\mathbf{u}$$
Associativities$$(\mathbf{u}\oplus\mathbf{v})\oplus\mathbf{w}=\mathbf{u}\oplus(\mathbf{v}\oplus\mathbf{w})$$ $$ k\otimes(l\otimes \mathbf{v})=(k l)\otimes \mathbf{v}$$
Existence of identities$$\exists \mathbf{ 0}\in V :\qquad\mathbf{v}\oplus\mathbf{0}=\mathbf{v} $$ $$1\otimes\mathbf{v}=\mathbf{v} $$
Existence of additive inverse$$\exists \mathbf{-v}\in V:\quad \mathbf{v}\oplus(-\mathbf{v})=\mathbf{0}$$
Distributivity$$k\otimes(\mathbf{u}\oplus\mathbf{v})=(k\otimes \mathbf{u})\oplus( k\otimes \mathbf{v})$$ $$(k+l)\otimes \mathbf{v}=(k\otimes \mathbf{v})\oplus( l\otimes \mathbf{v})$$

$(\mathbb{R},\oplus,\otimes)$ is a vector space, with $$u\oplus v=u+v\qquad u,v\in \mathbb{R} $$$$ \lambda\otimes u=\lambda u\qquad \lambda\in \mathbb{R},v\in \mathbb{R} $$
Denote the set of all polynomials of degree at most $ n $ on $ \R $ \[ \ms{P}_{n}:=\{a_nx^{n}+a_{n-1}x^{n-1}+\cdots+a_1x+a_0:a_i\in R \} \] The set of all polynomials is denoted by $ \ms{P} $.

Both $ \ms{P}_{n} $ and $ \ms{P} $ are vector spaces, with operations $$ f\oplus g:x\mapsto f(x)+g(x)\qquad f,g\in \ms{P} $$ $$ \lambda\otimes f:x\mapsto \lambda f(x) \qquad \lambda\in \R,f\in \ms{P} $$
Denote $\mathbb{R}^{S}:= \{f\mid f:S\to \R \}$, then $(\mathbb{R}^{S},\oplus,\otimes)$ is a vector space, where $$ f\oplus g:x\mapsto f(x)+g(x)\qquad f,g\in \mathbb{R}^{S} $$ $$ \lambda\otimes f:x\mapsto \lambda f(x)\qquad \lambda\in \mathbb{R},f\in \mathbb{R}^{S} $$
$(\R^{n\times m}, \oplus, \otimes)$ is a vector space, with $$ \begin{aligned} &(A \oplus B)_{i j}=A_{i j}+B_{i j} \quad A, B \in \R^{n\times m} \\ &(\lambda \otimes A)_{i j}=\lambda A_{i j} \quad \lambda \in \mathbb{R}, A \in \R^{n\times m} \end{aligned} $$

Direct Sums

Given vector spaces, it is possible to construct a new larger vector space. Let $ \ms{ V }$ and $ \ms{ W }$ be vector spaces. The direct sum of $ \ms{ V }$ and $ \ms{ W }$ is the vector space $$ \ms{ V } \bigoplus \ms{ W } = \{ (v,w) \mid v \in \ms{ V }, w \in \ms{ W } \} $$ with addition and scalar multiplication defined by $$ (v,w) \oplus (v',w') = (v \oplus_\ms{V} v', w \oplus_\ms{W} w') $$ $$ \lambda \otimes (v,w) = (\lambda \otimes_\ms{V} v, \lambda \otimes_\ms{W} w) $$

Direct sums of vector spaces are vector spaces.

The Euclidean spaces are essentially direct sums of $R$ $$\R^n =\bigoplus_{i=1}^{n}\R $$

Spans & Subspaces

Subspaces

Let $ \ms{ V }$ be a vector space. Let $ \ms{ S} \subset \ms{ V} $ be a nonempty subset.
If $\mathscr{S}$ is also a vector space over $\R$ with the same operations, then $\mathscr{S}$ is said to be a subspace of $\mathscr{V} .$

Instead of checking all the properties of a vector space,
it is sufficient to check that the subset is closed under the operations A nonempty subset $\mathcal{S}$ of a vector space $\mathcal{V}$ is a subspace of $\mathcal{V}$ iff

$\forall \mathbf{x}, \mathbf{y} \in \mathcal{S} \quad \Longrightarrow \quad \mathbf{x}+\mathbf{y} \in \mathcal{S}$
$\forall \mathbf{x} \in \mathcal{S},\alpha\in \R \quad \Longrightarrow \quad \alpha \mathbf{x} \in \mathcal{S}$

- Any line passing through the origin is a subspace of $\R^2$.
- Any line or plane passing through the origin is a subspace of $\R^3$.
  In fact, these are almost all the subspaces of $\R^3$, with just $2$ exceptions.
$$ \ms{P}_{n}\subspace \ms{P} \subspace \R^{\R}$$
- Denote the subset of continuous functions $$ C(\R)=\{f\in \R^{\R} : f\text{ is continuous}\}$$ $$ C(\R) \subspace \R^{\R} $$
- Denote the subset of differentiable functions $$C^1(\R)=\{f\in \R^{\R} : f\text{ is differentiable}\}$$ $$C^1(\R)\subspace C(\R) \subspace \R^{\R} $$
The transpose $ A^{T} $ of a $n \times m$ matrix $A$ is the $m \times n$ matrix \[ (A^{T})_{i j}=A_{j i} \] A square matrix, $H$, is symmetric if $H=H^T$. The subset of all Symmetric Matrices is a subspace of $\mathcal{M}(n,n)$.

Linear Combination and Span

A linear combination of vectors $\ms{B}= \{v_1,\cdots,v_m\} $ in $ \ms{V} $, is an object of the form \[ x=a_1 v_1+\cdots+a_n v_m,\qquad a_i\in \R \]
The set of all linear combinations of $ \ms{B} $ is called the span of $ \ms{B} $, denoted by $ \op{Span}(\ms{B}) $.
If $ \ms{V}=\op{Span}(\ms{B}) $, we say $ \ms{B} $ spans $ \ms{ V} $ or $\ms{B}$ is a spanning set for $\ms{V}$.

In $ \R^{2} $, let $$ e_1=\begin{pmatrix} 1\\0 \end{pmatrix},e_2=\begin{pmatrix} 0\\1 \end{pmatrix} $$ Every vector in $ \R^{2} $ is a linear combination of $ \{e_1,e_2\} $.
The Span of any subset of vectors is a subspace
Let $V$ be vector space , and $S\subset V$ \[ \text{span}\left(S\right) \underset{subspace}{\subset} \R^n\]
\[ ex^3+3x^2+x+\pi \] is in the span of \[ S=\{x^5,x^4,x^3,x^2,x,1\} \]

Check whether a vector is in the span or whether a subset spans a space, it boils down to solving a non-homogenous linear system $$ \begin{pmatrix} a\\b\\c \end{pmatrix} \stackrel{?}{\in} \op{Span}(\{\begin{pmatrix} 1\\0\\0 \end{pmatrix},\begin{pmatrix} 1\\1\\-1 \end{pmatrix},\begin{pmatrix} -2\\3\\1 \end{pmatrix} \})\qquad \Longrightarrow\qquad \text{ ? $ \exists x,y,z\in \R $} : \begin{pmatrix} a\\b\\c \end{pmatrix}=x\begin{pmatrix} 1\\0\\0 \end{pmatrix}+y\begin{pmatrix} 1\\1\\-1 \end{pmatrix}+z\begin{pmatrix} -2\\3\\1 \end{pmatrix}\qquad \Longrightarrow\qquad \begin{array}{rr} x+y-2 z= & a \\ y+3 z= & b \\ -y-z= & c \end{array}$$ When checking for spanning, it may often be easier to try simplifying the system by using column operations first, since the span of the columns of a matrix doesn't change under column operations.

Linear Independence & Basis

Linear Independence

Let $ \ms{V} $ be a vector space.

$ \ms{B} \subset \ms{V} $ is linearly independent if no vector in $ \ms{B} $ is a linear combination of the other vectors in $ \ms{B} $. $ \ms{B}=\{v_1,\cdots,v_m \}\subset\ms{V} $ is linearly independent iff\[ a_1v_1+\cdots+a_nv_m=0\implies a_i=0 \forall i \]

Check whether a subset is linearly independent it boils down to solving a homogenous linear system \[ \{\begin{pmatrix} 1\\0\\0 \end{pmatrix},\begin{pmatrix} 1\\1\\-1 \end{pmatrix},\begin{pmatrix} -2\\3\\1 \end{pmatrix} \}\quad\text{Linearly Independent ? } \qquad \Longrightarrow\qquad \text{? $ x=y=z=0 $ the unique solution to } \begin{pmatrix} 0\\0\\0 \end{pmatrix}= x\begin{pmatrix} 1\\0\\0 \end{pmatrix}+y\begin{pmatrix} 1\\1\\-1 \end{pmatrix}+z\begin{pmatrix} -2\\3\\1 \end{pmatrix} \text{?} \qquad \Longrightarrow\qquad \begin{array}{rr} x+y-2 z= & 0 \\ y+3 z= & 0 \\ -y-z= & 0 \end{array}\] If there is only one way to get the zero vector from a linear combination of the vectors, then each vector contains some data that's not in the others. If otherwise, suppose there exist $x,y,z \ne 0$ $$x\vc{v}+y\vc{u}+z\vc{w} = 0 \implies \vc{w} =\dfrac{1}{z} (-x\vc{v}-y\vc{u}) $$ $\vc{w}$ is redundant in the sense that it can be expressed as a linear combination of the other vectors.

Basis

Let $ \ms{V} $ be a vector space. $ \ms{B} \subset \ms{V} $ is a basis of $V$, if $ \mathscr{ B} $ is linearly independent and spans $V$.

The number of elements in a basis is called the dimension of the space $ V $.

Let $ \ms{ B}_1, \ms{ B}_2 $ be two different basis of a vector space $ \ms{V} $. We must have \[ |\ms{B}_1|=| \ms{B}_2| \]
Every vector space has a basis.
Any linearly independent subset can be extended to a basis.
Any spanning set contains a basis.

- For the space $\R^{n} $, consider \[ \ms{B}=\{e_1,e_2,\cdots,e_n \} \] Where $ e_k $ be the column vector with the k-th entry $ 1 $ and all other entries $ 0 $.
  
  $ \ms{B} $ is is called the standard basis of $ \R^{n} $.
- \[ \{x^5,x^4,x^3,x^2,x,1 \} \] is the standard basis for $ \ms{P}_5 $. What is the dimension of $ \ms{P}_5 $?
- $$\{ \begin{bmatrix} 1&0\\ 0&0 \end{bmatrix}, \begin{bmatrix} 0&1\\ 0&0 \end{bmatrix} , \begin{bmatrix} 0&0\\ 1&0 \end{bmatrix} , \begin{bmatrix} 0&0\\ 0&1 \end{bmatrix}\} $$ is the standard basis for $\R^{2\times 2}$.
Consider $$Z = \{ p(x)\in \mathcal{P}_3 : p(7) = 0 \}\subspace \mathcal{P}_3$$ The condition $p(7)=0 $ implies, by the Fundamental Theorem of Algebra, that $$(x-7)$$ is a factor for any vector $p(x)$ in $Z$, i.e. $$ p(x)=c(x-7)(x-a)^i(x-b)^j,\quad i,j\in \{0,1\},c\in \R \qquad \forall p(x)\in Z $$ A basis for $Z$ will be $$\{x-7, x^2-7x,x^3-7x^2 \}$$
Let $ E $ be a subset of a vector space $\ms{V} $. $$\op{Span}(E)\subspace \ms{ V} $$ If $ E $ is linearly independent, then $ E $ is a basis for $\op{Span}(E) $.

To check whether a set of vectors is a basis for $\R^n$, is essentially to check if it is linearly independent and spans $\R^n$.

Internal Direct Sums

Given a vector space of dimension $n\ge 2$, it is possible to decompose it into small pieces. For a vector space $W$ and subspaces $U$ and $V$ of $W$.

$W$ is said to be the (internal) direct sum of $U$ and $V$ if $$\forall w\in W, \exists !\ ( u\in U,v\in V ): w=u+v$$ and we write $$W=U\bigoplus V$$

Let $\ms{E}\subset \R^{\R} $ be the subset of all even functions, and let $\ms{O}\subset \R^{\R} $ be the set of all odd functions. Then \[\R^{\R}=\ms{E}\bigoplus \ms{O} \] Since for each $f\in \R^\R$ $$f(x) = \underbrace{\dfrac{f(x)+f(-x)}{2}}_{\ms{E}} + \underbrace{\dfrac{f(x)-f(-x)}{2}}_{\ms{O}}$$
Let $$ Q_i = \{ cx^i : c\in \R \}$$ then $$ \ms{P}_n = \bigoplus_{i=0}^{n} Q_i $$
Let $\mc{A}= \{ v_1, \cdots , v_n \}$ be a basis for vector space $V$. Then $$ V = \span\{v_1,\cdots,v_m \} \bigoplus \span\{v_{m+1},\cdots,v_n \} $$ for any $m\in \{1,\cdots,n-1\}$.

Linear Maps

Definitions and Examples

Linear transformations

For any mathematical construction, it is important to study structure-preserving maps. For Euclidean spaces, the structure-preserving map is the linear map. Let $V$ and $U$ be vector spaces over $R$. A map $$F: V \rightarrow U$$ is called a linear map or linear transformation if

For any vectors $v, w \in V$, $F(v+w)=F(v)+F(w)$
For any scalar $k \in R$ and vector $v \in V$, $F(k v)=k F(v)$.

A linear map from a vector space $V $ to $ \R $ is called a linear functional.

Matrix-vector multiplication is Linear
Given a matrix $M$ of size $n\times m$, the map \[ x\mapsto Mx\qquad x\in \R^{m} \]is a linear map $ \R^{m}\to \R^{n} $.
- Projection is linear \[ F: \mathbb{R}^{3} \rightarrow \mathbb{R}^{3} \] \[ F(x, y, z)=(x, y, 0) \]
- Translation is not Linear \[ G:\R^{2}\to \R^{2} \]\[ G(x, y)=(x+1, y+2) \]
- Differentiation is linear \[ \op{D}:\mathcal{P}\to \mathcal{ P} \] \[ \op{D}f(x)=f'(x) \]
- Definite Integrals are linear functionals. $$\mathbf{J}: \ms{P} \rightarrow \mathbb{R}$$ \[ \mathbf{J}(f(t))=\int_{0}^{1} f(t) d t \]
The trace of a sqaure matrix $ M $ is defined as\[ \op{tr}(M)=\sum_{i=1}^{n}M_{ii} \]Consider it as a map between vector spaces\[ \op{tr}: \R^{n\times n} \to \R \] is a linear functional.

The technical jargon for a structure-preserving map is homomorphism.

You can change the matrix $M, v$ to see how the linear map $M$ acts on the standard basis vectors and $v$.

$M$

$v$

Isomorphisms

Let $F: V \rightarrow W$ be a linear map

$F$ is one-to-one if and only if $$F(v)=F(u) \implies v=u$$
$F$ is onto if and only if $$\forall w \in W, \exists v\in V : F(v)=w $$
$F$ is an isomorphism if it is both one-to-one and onto.

Consider the differential operator $$D: \mathcal{P}_n \to \mathcal{P}_{n-1}$$ \[ Df(x)=f'(x) \] For any $g \in \mathcal{P}_{n-1}$, let $G \in \mathcal{P}_n$ be an anti-derivative of $g$, then $D(G)=g$.
This shows that the map is onto.

The map is not one-to-one, since $$D(x+3) = D(x+4) = 1$$ but $x+3 \ne x+4$.
- The embedding $E: \R^{n} \rightarrow \R^{n+1}$ $$E\begin{pmatrix} v_1 & v_2 & \cdots & v_n \end{pmatrix}= \begin{pmatrix} v_1 & v_2 & \cdots & v_n & 0 \end{pmatrix}$$ $E$ is not onto since nothing gets map to $\begin{pmatrix} 1 & 1 & \cdots & 1 & 1\end{pmatrix}$.
- The projection $P: \R^{n+1} \rightarrow \R^{n}$ $$P\begin{pmatrix} v_1 & v_2 & \cdots & v_n & v_{n+1} \end{pmatrix}= \begin{pmatrix} v_1 & v_2 & \cdots & v_n \end{pmatrix}$$ is not one-to-one since $$P\begin{pmatrix}0& 0& 0& \cdots& 0& 1\end{pmatrix}=P\begin{pmatrix}0& 0& 0& \cdots& 0& 0\end{pmatrix}=\vc{0}$$ but $$\begin{pmatrix}0& 0& 0& \cdots& 0& 1\end{pmatrix}\ne \begin{pmatrix}0& 0& 0& \cdots& 0& 0\end{pmatrix}$$
Consider the transpose operator $$T: \R^{n\times m} \to \R^{m\times n}$$ \[ T(M)=M^T \] The map is onto, since for any $W \in \R^{m\times n}$, we have $W^T\in \R^{n\times m} $, and $$T(W^T)=W$$ The map is also one-to-one, if $T(M) = T(W)$, by linearity $$(M-W)^T=T(M-W) =T(M)-T(W) = \vc{0} $$ But the only matrix with transpose having all zeros is the zero matrix, thus $M=W$.

It follows that $T$ is an isomorphism.

Let $\mathcal{B}=\left\{\beta_1, \cdots, \beta_n\right\}$ be a basis of vector space $V$. Let $v \in V$, then $v$ can be expressed uniquely as $$v=c_1 \beta_1+\cdots+c_n \beta_n$$ Define a linear map $I_\mathcal{B}: V \rightarrow \R^{n}$ by $$I_\mathcal{B}(v)=\left(\begin{array}{c} c_1 \\ \vdots \\ c_n \end{array}\right)$$ Then $I_\mathcal{B}$ is a linear map from $V$ to $\R^{n}$ that is both one-to-one and onto. Thus $I_\mathcal{B}$ is an isomorphism from $V$ to $\R^{n}$. $$ v_{\mathcal{B}}:=\left(\begin{array}{c} c_1 \\ \vdots \\ c_n \end{array}\right) $$ is the column representation or coordinate vector of $v$ with respect to the basis $\mathcal{B}$.

Matrix Representation

Map between Euclidean Spaces

For any $L:\R^n\to\R^m$ is a linear map, there exists a matrix $M_L$ such that \[ L(\mathbf{u})=M_L\mathbf{u} \] Take the simple case of a linear map $L:\R^2\to\R^2$, suppose $L$ is represented by the matrix $M = \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix}$.
How do we find the matrix? I.e. what are the entrie $m_{11}$, $m_{12}$, $m_{21}$, $m_{22}$?

Consider applying $L$ to the basis vectors $\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ and $\mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$. Then, \[ L(\mathbf{e}_1) = \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} m_{11} \\ m_{21} \end{bmatrix} \qquad \text{and} \qquad L(\mathbf{e}_2) = \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} m_{12} \\ m_{22} \end{bmatrix} \] If for example $L$ is the linear map swaps the coordinates, $$ L(\begin{bmatrix}x_1\\x_2\end{bmatrix}) = \begin{bmatrix}x_2\\x_1\end{bmatrix} $$ then \[ L(\mathbf{e}_1) = \begin{bmatrix} m_{11} \\ m_{21} \end{bmatrix} = {\color{green}\begin{bmatrix} 0 \\ 1 \end{bmatrix}} \qquad \text{and} \qquad L(\mathbf{e}_2) = \begin{bmatrix} m_{12} \\ m_{22} \end{bmatrix}= {\color{red}\begin{bmatrix} 1 \\ 0 \end{bmatrix}} \] So the matrix is \[ M = \begin{bmatrix} {\color{green}0} & {\color{red}1} \\ {\color{green}1} & {\color{red}0} \end{bmatrix} \] This process can be generalized to any linear map $L:\R^n\to\R^m$, see the proof below.

Given any linear map $$L:\R^n\to\R^m$$ there exists a unique matrix $M$ such that $L(x)=Mx$ for all $x\in\R^n$.
And $M$ has columns given by the image of the standard basis $L(e_i)$ $$ M=\begin{bmatrix} L(\mathbf{e}_1) & \cdots & L(\mathbf{e}_n) \end{bmatrix} $$

Take any vector $v\in \R^n$, $$ v=\begin{pmatrix} v_1\\ \vdots\\ v_n \end{pmatrix}= \sum_{i=1}^n v_i \mathbf{e}_i $$ If we apply $L$ to $v$, by linearity we have $$ L(v)=\sum_{i=1}^n v_i L(\mathbf{e}_i) = \begin{bmatrix} L(\mathbf{e}_1) & \cdots & L(\mathbf{e}_n) \end{bmatrix} \begin{bmatrix} v_1\\ \vdots\\ v_n \end{bmatrix} = Mv $$ Where $M$ is the matrix with i-th column $L(\mathbf{e}_i)$.

Map between General Spaces

Suppose $V$ and $U$ are vector spaces with basis $$ A=\left\{v_1, v_2, \ldots, v_n\right\} \quad \text { and } \quad B=\left\{w_1, w_2, \ldots, w_m\right\} $$ Denote the column representation of vectors in $V, W$ with respective to $A, B$ by $$ v_A \quad w_{B} $$ If $$F: V \rightarrow W$$ is a linear mapping, the matrix representation of $F$ with respect to the basis $A$ and $B$ is given by $$ F_{A, B}:=\left[\begin{array}{llll} F\left(v_1\right)_{B} & F\left(v_2\right)_{B} & \cdots & F\left(v_m\right)_{B} \end{array}\right] \in \R^{m \times n} $$ For any vector $v\in V$, we have $$ F(v)_{B}=F_{A,B} v_{A} $$ $$F(v) = I_B^{-1}( F_{A,B} v_{A})$$

As a special case, if $V=W$ and $A,B$ are two basis of $V$, then $id_{A,B}$ is the change of bases matrix from $A$ to $B$. Let $S$ be the canonical basis of $V$, $A,B$ be two other bases of $V$, then the change of basis matrix can also be obtained (very inefficiently from) $$ id_{A,B} = [(B_1)_S \cdots (B_n)_S]^{-1} [(A_1)_S \cdots (A_n)_S] $$

Composition and Matrix Multiplication

Suppose we have two linear maps $$f: \R^n \to \R^m,\qquad g: \R^m \to \R^p$$ Represented by matrices $$M_f,\qquad M_g$$ Then the composition $$g \circ f : \R^n \to \R^p$$ is represented by the matrix $M_g M_f$, since $$ (g \circ f)(x) = g(f(x)) = M_g f(x) = M_g M_f x $$ to summarize, $$ g \circ f : x\mapsto M_g M_f x $$

The same is true for general linear maps Let $\mathcal{B}_U, \mathcal{B}_V, \mathcal{B}_W$ be bases of vector spaces $U, V, W$, respectively.

Let $F:$ $U \rightarrow V$ and $G: V \rightarrow W$ be linear maps. Then $$ (G \circ F)_{\mathcal{B}_U, \mathcal{B}_W}=G_{\mathcal{B}_V, \mathcal{B}_W}F_{\mathcal{B}_U, \mathcal{B}_V} $$

Image & Kernel

Let $F: V \rightarrow W$ be a linear map between vector spaces $V, W$

The image or range of $F$ is the set $$ \im F:=\{F(v): v \in V\} $$ The rank of $F$ is the dimension of $\im F$.
The kernel or null space of $F$ is the set $$ \ker F=\{v \in V: F(v)=\mathbf{0}\} $$ The nullity of $F$ is the dimension of $\ker F$.

If $ c_1,\cdots,c_m $ are columns, and $ r_1,\cdots,r_n $ are rows of some $ n\times m$ matrix $M$,

The span of the columns $ c_1,\cdots,c_m $ is called the column space of the matrix, $$ \op{col}(M) = \op{Span}(\{c_1,\cdots,c_m \})$$ the column rank of $M$ is the dimension of $\op{col}(M)$.

The span of the rows $ r_1,\cdots,r_n $ is called the row space of the matrix, $$ \op{row}(M)= \op{col}(M^T) = \op{Span}(\{r^T_1,\cdots,r^T_n \})$$ the row rank of $M$ is the dimension of $\op{row}(M)$. The row rank and column rank of a matrix are equal.
The result follows from the following oberservations:
- If $\{v_1,\cdots, v_n\}$ is linearly independent, then $\dim \span \{v_1,\cdots, v_n\}=n$
- If $E$ is an invertible matrix, then $\dim \span \{E v_1,\cdots, E v_n\}=n$
- Row operations are just multiplication on the left by an elementary matrix. And elementary matrices are invertible. It follows that $$ \dim \operatorname{col} EM = \dim \operatorname{col} M $$ Thus row operations doesn't change the column rank.
- Certainly row operations don't change the row rank. Convert the matrix to row echelon form via row operations. The number of pivots gives both the row rank and the column rank, thus they are equal.
The null space of $M$ is the set of all vectors $ \mathbf{v} $ such that $ M\mathbf{v} = 0 $. It is denoted by $\op{null}(M)$.
The nullity of $M$ is the dimension of $\op{null}(M)$.

All of these are subspaces of the corresponding parent vector spaces.

Let $F: V \rightarrow W$ be a linear map between vector spaces $V, W$ $$\im F \subspace W$$ $$\ker F \subspace V$$ If $M$ is an $n\times m$ matrix, then $$\op{col}(M) \subspace \R^n $$ $$ \op{null}(M) \subspace \R^m $$ $$\op{row}(M) = \op{col}(M^T) \subspace \R^m$$ If $$L: \R^{n} \rightarrow \R^{n}$$ is a linear map, and $M_L$ is the matrix representation of $L$, then $$\im{L} = \op{col}(M_L)$$ $$\ker{L} = \op{null}(M_L)$$

A vector space isomorphism is essentially a linear map with trial kernel and full image.

$F$ is one-to-one iff $\ker F=\{\mathbf{0}\}$.
$F$ is onto iff $\im F=U$.

For one-to-one : Suppose $$F(v)=F(u) \implies v=u$$ if $\ker{F}\ne \{0\}$, i.e. there exist $g\ne \vc{0} \in \ker{F}$ such that $F(g)=\vc{0}$ then for any $u$ $$F(u+g)= F(u)+F(g)=F(u)$$ but $u+v\ne u$ since $v\ne \vc{0}$, a contradiction.

On the other hand, if $\ker{F}=\{0\}$ then $$F(v-u)=F(v)-F(u)=0 \implies v-u\in \ker{F} \implies v-u=\vc{0}\implies v=u$$
The proof for onto is easy, I leave it as an exercise.

Let $V$ be a vector space and $F$ a linear map on $V$ to some vector space $U$. $$ \operatorname{dim} V=\operatorname{dim}(\ker F)+\operatorname{dim}(\operatorname{Im} F) $$ Let $F: V \rightarrow W$ be a linear map, and $V,W$ are of the same dimension.
Then $F$ is an isomorphism.

Statistics of Linear Maps

Determinant

Definition and Properties

Let $S_n$ be the set of all order $n$ permutations. The determinant of a square matrix is defined by \[ \det{A}=\sum_{\sigma \in S_{n}}(\operatorname{sgn} \sigma) a_{1 \sigma(1)} a_{2 \sigma(2)} \cdots a_{n \sigma(n)} \]

$$ \begin{aligned} \det{A^T}&=\det{A}\\ \det{AB}&=\det{A}\det{B} \end{aligned} $$
$$ \begin{aligned} \det{E_{R_i \leftrightarrow R_j}}=-1 &= \det{E_{C_i \leftrightarrow C_j}}\\ \det{E_{kR_i\rightarrow R_i}}=k &= \det{E_{kC_i\rightarrow C_i}}\\ \det{E_{R_i+kR_j\rightarrow R_i}}=1 &= \det{E_{C_i+kC_j\rightarrow C_i}} \end{aligned} $$

If $M$ is triangular, then $$\det{M} = \prod_{i=1}^n M_{i i}$$

Laplace Expansion

A more efficient way is to use the Laplace Expansion.

Consider $A=\left[a_{i j}\right] \in \R^{n\times n}$. Let $M_{i j}$ denote the submatrix of $A$ obtained by deleting its $i$ th row and $j$ th column. The determinant $$ \det M_{i j} $$ is called the minor of the element $a_{i j}$, and the cofactor of $a_{i j}$, denoted by $C_{i j}$, is defined by: $$ C_{i j}=(-1)^{i+j}\det M_{i j} $$ Then the determinant of $A$ is given by $$ \det A=\sum_{j=1}^{n} a_{i j} C_{i j} $$ the above formula is independent of $i$, where the laplace expansion is performed along the ith row.

One can also perform the laplace expansion along the $j$-th column, and the result is the same.

Usually the convention is to perform the expansion across the row or column with the most zeros.

Geometries of Determinant

A shear matrix is an elementary matrix of the form $$ E_{C_i+kC_j\rightarrow C_i} $$ Let $M$ be a matrix with columns $C_1,\ldots,C_n$.

The parallelepiped spanned by the columns of $M$ are essentially $$ P(M) = \{\lambda_1 C_1+\lambda_2 v_2+\cdots+\lambda_n C_n: \lambda_i\in [0,1]\} $$ Let $E$ be a shear matrix. Then $$ \operatorname{vol} P(ME) = \operatorname{vol} P(M)$$ If $E = E_{C_i+kC_j\rightarrow C_i}$, then $$ (ME)_{:,i} = C_i + k C_j $$ $$ \begin{aligned} P(ME) &= \{\lambda_1 C_1+\lambda_2 v_2+\cdots+ \lambda_i(C_i + k C_j) +\cdots+\lambda_n C_n: \lambda_i\in [0,1]\}\\ &= \{\lambda_1 C_1+\lambda_2 v_2+\cdots+ \lambda_i C_i +\cdots+ (\lambda_ik+\lambda_j) C_j +\cdots+\lambda_n C_n: \lambda_i\in [0,1]\}\\ \end{aligned} $$ Which is just a translation of all the points in $P(M)$ by $\lambda_ik C_j$ (this translation is not uniform, it dependes on $\lambda_i$, hence the name shearing).

The determinant of a square matrix $M\in \R^{n\times n}$ $$|\det M |= \operatorname{vol} M$$ is the signed volume of the parallelepiped spanned by the columns of $M$.

The result is easy to see when $M$ is diagonal.
The result is also "obvious" when the columns of $M$ is not linearly independent. In this case, the determinant is zero, and the volume of the parallelepiped is zero. Since the span of the columns of $M$ is just a "thin hyperplane".
Assume the columns of $M$ is linearly independent. We can use a series of shearing to transform $M$ into triangular form.

If $\det M = 0$, then the columns of $M$ are coplanar.

Adjoin & Inverse

The adjoint, $\adj{M}$, of a matrix $M$ is $$ \adj{M} = C^T $$ Where $C$ is the matrix of cofactors of $M$.

If $\det M \ne 0$, then $$M^{-1} = \frac{1}{\det M}\adj{M}$$ $$ \det{M^{-1}}=(\det{M})^{-1}$$ If $\det M = 0$, then $M$ is singular (not invertible).

Eigen Values

Definition and Computations

Let $ F:V\to V $ be a linear map on some vector space $ V $.

A scalar $\lambda$ is called an eigenvalue of $F$ if there exists a nonzero vector $v$ such that \[ F( v)=\lambda v \] Any vector satisfying this relation is called an eigenvector of $F$ associated with the eigenvalue $\lambda .$

The set of all eigenvalues of $F$ is called the spectrum of $F$, denoted by $$ \lambda(F) $$ The subspace of all eigenvectors of $F$ associated with $\lambda$ $$ E_F(\lambda) = \{v\in V: F(v)=\lambda v\}$$ is called the eigenspace associated with $\lambda$.

Abusing the notation a bit, we will also use $F$ as the matrix representation of $F$. \[ F( v)=\lambda v \implies (F-\lambda) v=0 \implies \det(F-\lambda I)=0 \] The polynomial $$\chi_F(\lambda)=\det(F-\lambda I)$$ is called the characteristic polynomial of $F$.

The eigenvalues of $F$ are given by the roots of $\chi_F(\lambda)$.

If $$A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$ then its characteristic polynomial is $$\chi_A(\lambda) = \det(A - \lambda I) = (\lambda - 1)(\lambda - 4) - 6 = \lambda^2 - 5\lambda - 2$$ The roots of this polynomial are $$\lambda_1 = \frac{5 + \sqrt{33}}{2},\qquad \lambda_2 = \frac{5 - \sqrt{33}}{2}$$ which are the eigenvalues of $A$. Then any vectors in $$\ker(A - \lambda_i I),\qquad \lambda_i\in\{1,2\}$$ is an eigenvector for $\lambda_i$.

Geometry of eigen value and vectors

Let $\chi_F(\lambda)$ be the characteristic polynomial of a linear map $F$.

The algebraic multiplicity of $\lambda$ is the degree of $\lambda$ in $\chi_F$.

The geometric multiplicity of $\lambda$ is the dimension of $E_F(\lambda)$.

An eigenvalue is defective if its algebraic multiplicity is greater than its geometric multiplicity.

The matrix $$M=\begin{bmatrix} 3 & 1 \\ 0 & 3 \end{bmatrix}$$ has an eigenvalue $\lambda = 3$ with algebraic multiplicity $2$ and geometric multiplicity $1$.

And the matrix is invertible since $0$ is not an eigenvalue.
If $F:V\to W$ has no defective eigenvalues, then $$V = \bigoplus_{\lambda\in \lambda(F)} E_F(\lambda)$$

Diagonalization

Two matrices $A$ and $B$ are said to be similar if there exists a matrix $P$ such that $$P^{-1}AP=B$$ A matrix $A$ is said to be diagonalizable if $A$ is similar to a diagonal matrix. A square matrix $A$ is diagonalizable if and only if $A$ has no defective eigenvalues.

Equivalently, if and only if $A$ has $n$ linearly independent eigenvectors. If $M\in \R^{n\times n}$ has $n$ linearly independent eigenvectors $$ E = \{v_1,\cdots, v_n \} $$ with corresponding eigenvalues $$ \lambda_1,\cdots, \lambda_n $$ then $E$ is a basis for $\R^n$,

Let $S$ be the standard basis of $\R^n$
Let $P$ be the matrix whose columns are the eigenvectors of $M$.

Note that $P$ is just a change of basis matrix $$P= id_{E,S}$$
Let $D$ be the diagonal matrix with the eigenvalues of $M$ on the diagonal.

Note that $D$ is just the matrix representation of $M$ with respect to the basis $E$. $$D = M_{E,E}$$

Then $$ D = M_{E,E} = P^{-1}MP \implies M = PDP^{-1} $$ Which is obviouse from the following commutative diagram.

Singular Values

Eigenvalues are only defined for square matrices. For rectangular matrices, we can use the singular values. Let $A$ be a matrix. The singular values of $A$ are the square roots of the eigenvalues of $A^TA$.

Singular Value Decomposition

The Singular Value Decomposition of a matrix $A$ of size $m\times n$ is given by $$ A = U\Sigma V^T $$ where

$U$ is an $m\times m$ orthogonal matrix
$V$ is an $n\times n$ orthogonal matrix
$\Sigma$ is an $m\times n$ diagonal matrix with the singular values of $A$ on the diagonal.

$$\sigma_1 = \max_{\|x\|=1} \|Ax\|= \max_{\|x\|=1} x^TA^TAx$$ $$\sigma_2 = \max_{\|x\|=1} \|Ax\| \text{ s.t. } x\perp u_1$$

Principal Value Decomposition

Suppose $X$ is is a rectangular matrix of size $m\times n$. We want to find a vector $w$ such that the projection of $m$ data points $x_1,\cdots,x_m$ onto $w$ that retains the most variance / information. This is equivalent to finding the vector $w$ such that $$w= \arg\max_{w\in \S^{n-1} } \underbrace{\sum_{i=1}^{m} (x_i\cdot w)^2}_{\text{retained variance}}=\arg\max_{w\in \S^{n-1} } w^T X^TX w$$

Let $X$ be a matrix of size $m\times n$, with covariance matrix $C = X^TX$.

The Principal Value Decomposition of $X$ is given by $$ T = XW $$ where $W$ is the matrix whose columns are the eigenvectors of $C$.

Inner Product Spaces

A real vector space $\mathscr{V}$ is said to be an inner-product space if for each pair of elements $x$ and $y$ in $\mathscr{V}$, there is a number $\langle x, y\rangle$, called the inner product of $x$ and $y$, such that

$$\langle x+y, z\rangle=\langle x, z\rangle+\langle y, z\rangle$$
$$\langle\alpha x, y\rangle=\alpha\langle x, y\rangle$$ for all $x, y \in \mathscr{V}$ and $\alpha \in \mathbb{R}$.
$$\langle x, x\rangle \geq 0$$
$$\langle x, x\rangle=0 \iff x=0$$

The dot product\[\Inn{u}{v}:=u\cdot v= u^{T}v=u_1v_1+u_2v_2+\cdots+u_nv_n \] turns $\R^n$ into an inner-product space.
$\ms{P}$ with the following inner product is an inner product space \[ \langle f, g\rangle=\int_{0}^{1} f(t) g(t) d t \]
$ \R^{m\times n} $ is an inner product space with the following inner product \[ \Inn{A}{B}=\operatorname{tr}(B^{T}A) \]

Geometries of Inner Product

Length and Angle

We can extend the definition of norm and angle in $\R^n$ in the natural way The norm of an element $v$ of an inner-product space is defined to be $$ \|v\|=\sqrt{\langle v, v\rangle} $$ The angle between two vectors $ {v} $, $ {w} $ in $ \ms{V} $ is defined as \[\measuredangle( v, w) = \arccos\frac{\Inn{v}{w}}{\Nor{ v}\,\Nor{ w}} \]Two vectors $ v,w\in \ms{V} $ are perpendicular iff \[ \Inn{v}{w}=0 \]$ v,w \in \ms{V}$ are parallel if \[ v=\lambda w\qquad \text{ for some }\lambda\in \R \]

Cauchy-Schwarz$$ |\langle x, y\rangle| \leq\|x\|\|y\| \quad \text { for all } x, y \in \mathscr{V} $$
Triangle Inequality$$ \|x+y\| \leq\|x\|+\|y\| \quad \text { for all } x, y \in \mathscr{V} $$
Parallelogram Law$$ \|x+y\|^{2}+\|x-y\|^{2}=2\|x\|^{2}+2\|y\|^{2} \quad \text { for all } x, y \in \mathscr{V} $$
Law of Cosine \[ \Nor{x-y}^{2}=\Nor{x}^{2}+\Nor{y}^{2}-2\Nor{x}\Nor{y}\cos(\measuredangle(x,y)) \]

Projections and Complements

The orthogonal complement of a subset $\mathscr{S}$ of $V$ is the set $\mathscr{S}^{\perp}$, such that $$x \in \mathscr{S}^{\perp} \iff \langle x, y\rangle=0,\quad \forall y \in \mathscr{S}$$

Let $S$ be a subset of a vector space $V$. Then $S^{\perp}$ is a subspace of $V$. Let $A\in \R^{n\times m}$, $$ \ker A = (\im A^T)^{\perp} $$

Let $S$ be a subset of a vector space $V$, consider the maps $$P_S:=x\mapsto \hat{x}, \quad I-P_S$$ where $\hat{x}$ the point in $S$ that minimizes $\|x-\hat{x} \|$, then

$P_S: V\to S$ is surjective.
$I-P_S: V\to S^\perp$ is surjective.

$P_S$ is called the projection onto $S$.

If $u$ is a unit vector in $V$, then $$P_{\span{u}}x = \underbrace{\inn{u}{x}}_{comp_u(x)} u = uu^T x$$ If $u$ is not a unit vector, then $$P_{\span{u}}x = u(u^Tu)^{-1}u^T x = \dfrac{\inn{u}{x}}{\inn{u}{u}}u$$ for proof and explaination, please see my calculus notes.
If $\mathbf{x}_{i} \in \mathbb{R}^{n}, i=1, \ldots, m$, and $M=\operatorname{Span}\left\{\mathbf{x}_{1}, \ldots, \mathbf{x}_{m}\right\}$ then if $XX' $ is invertible $$P_M \mathbf{x}=X\left(X^{\prime} X\right)^{-1} X^{\prime} \mathbf{x}$$ where $X$ is the matrix with the i-th column of $X$ being $\mathbf{x}_{i}$.

Orthogonormal Decomposition

Orthogonormal

Consider a set $S=\left\{u_{1}, u_{2}, \ldots, u_{r}\right\}$ of nonzero vectors in an inner product space $V$.

$S$ is orthogonal if each pair of vectors in $S$ are orthogonal,$$\left\langle u_{i}, u_{j}\right\rangle=0 \qquad i \neq j$$ and $S$ is called orthonormal if $S$ is orthogonal and each vector in $S$ has unit length. That is, \[\Inn{u_i}{u_j}= \begin{cases} 0\qquad&i\ne j\\ 1&i=j \end{cases} \]

Consider the following subset of $ C[-\pi,\pi] $\[S= \{1, \cos t, \cos 2 t, \cos 3 t, \ldots, \sin t, \sin 2 t, \sin 3 t, \ldots\} \] $ S $ is orthogonal, but not orthonormal.
\[ E=\left\{e_{1}, e_{2}, e_{3}\right\}=\{\begin{pmatrix} 1\\0\\0 \end{pmatrix},\begin{pmatrix} 0\\1\\0 \end{pmatrix},\begin{pmatrix} 0\\0\\1 \end{pmatrix}\} \] is an orthonormal basis for $\R^3$.
The following subset of $L^2[-\pi,\pi]$ is orthonormal,$$\{e_n:=\exp(inx):n\in \mb{Z} \}$$

Suppose $U = \left\{u_{1}, u_{2}, \ldots, u_{r}\right\}$ is an orthogonal set of nonzero vectors. Then $U$ is linearly independent.
\[ \left\|u_{1}+u_{2}+\cdots+u_{r}\right\|^{2}=\left\|u_{1}\right\|^{2}+\left\|u_{2}\right\|^{2}+\cdots+\left\|u_{r}\right\|^{2} \]
If in addition, $U$ is orthonormal, then let $M=\op{Span}(U)\subset \ms{V}$ $$ P_{M} x=\sum_{i=1}^{k}\left\langle x, u_{i}\right\rangle u_{i} \quad \text { for all } x \in \mathscr{V} $$

Gram-Schmidt Orthogonalization

Given ${v_1,\cdots,v_n}$ be a basis of $V$

Let $$u_1=v_1$$
For $i=2,\cdots,n$, let $$u_i=v_i-\sum_{j=1}^{i-1}\frac{\Inn{v_i}{u_j}}{\Inn{u_j}{u_j}}u_j$$

$$\{u_1,\cdots,u_n\}$$ is an orthogonal basis of $V$.

Spectral Theorem

A matrix $U\in \R^{n\times n}$ is called orthogonal if $$U^TU=I$$ A matrix $M$ is orthogonally diagonalizable if $$\exists \text{ orthogonal } U\ : \qquad U^TMU=D$$ where $D$ is a diagonal matrix.

If $A\in \R^{n\times n}$ is orthogonal, then $$\{A_{:,i},\cdots,A_{:,n}\}\qquad \text{is a basis of }\R^n$$ $$A^{-1}=A^T$$ $$\|Ax\|=\|x\|\qquad \forall x\in \R^n$$ $$\Inn{Ax}{Ay}=\Inn{x}{y}\qquad \forall x,y\in \R^n$$

$A\in \R^{n\times n}$ is orthogonally diagonalizable if and only if $A$ is symmetric.

QR Decomposition

Let $A\in \R^{m\times n}$ be invertible. Then there exists a decomposition $$A=QR$$ where $Q\in \R^{m\times m}$ is orthogonal and $R\in \R^{m\times n}$ is upper triangular.

Apply Gram-Schmidt to the columns of $A$ to get $Q$. Thenz $$A=QT$$ where $$T=Q^TA$$ is upper triangular.

Functional Analysis

A linear algebra class typically studies finite dimensional vector spaces. The study of infinite dimensional vector spaces is called functional analysis.

We have already encountered a few examples of infinite dimensional vector spaces.

$\ms{P}$, the space of all polynomials.
$\R^\R$, the space of all real valued functions on $\R$.
$C[-\pi,\pi]$, the space of all continuous functions on $[-\pi,\pi]$.
$L^2[-\pi,\pi]$, the space of all square integrable functions on $[-\pi,\pi]$.

Banach Spaces

Normed Spaces

A norm on a vector space $V$ is a function $\norm{\cdot}$ such that

$\norm{x}\ge 0$ for all $x\in V$.
$\norm{x}=0$ if and only if $x=0$.
$\norm{\alpha x}=\left|\alpha\right|\norm{x}$ for all $\alpha\in \R$ and $x\in V$.
$\norm{x+y}\le \norm{x}+\norm{y}$ for all $x,y\in V$.

$(V, \norm{\cdot})$ is called a normed space.

If $V$ is an inner product space, then $$\norm{x}:=\sqrt{\langle x,x\rangle}$$ is a norm. Thus every inner product space is a normed space.
Not every normed space is an inner product space.
Consider $C[0,1]$ with the norm $$\norm{x}:=\sup_{t\in [0,1]}\left|x(t)\right|$$ and $$f(x)=x,\ g(x)=x^2 \in C[0,1]$$ The parallelogram law does not hold for these functions $$ \begin{aligned} \|f+g\|^2+\|f-g\|^2 &= 4 + 1/16 \\ &\ne 2+2 =2\|f\|^2+2\|g\|^2 \end{aligned}$$

Banach Spaces

A sequence $\left\{x_{n}\right\}\subset \ms{V}$ is a Cauchy sequence if $$\forall \varepsilon > 0, \exists N\in \N^+ : \left\|x_{n}-x_{m}\right\|<\varepsilon,\qquad \forall m, n>N$$ A Banach space is a normed space such that every Cauchy sequence has a limit.

A Banach Space with norm induced by an inner product is called a Hilbert Space. The set of all real valued random variables with finite second moment is a Hilbert space. For details see notes.

Continuous Linear Operators

A linear operator $T:V\to W$ is continuous iff $$\sup_{\norm{v}=1}\norm{T(v)}< \infty$$ $$\norm{T} = \sup_{\norm{v}=1}\norm{Tv}$$ Defines a norm on $$ \ms{L}(V,W) = \left\{T: V\to W \mid A \text{ is linear and continuous} \right\} $$

Linear Differential Operatior

Green's Functions

Let $L$ be a linear differential operator on a space of functions $f$.

A Green's function for $L$ is a function $G(x,t)$ such that $$LG(x,t)=\delta(x-t)$$ where $\delta(x-t)$ is the Dirac delta function.

If $G(x,t)$ is a Green's function for $$LG(x,t)=\delta(x-t)$$ then $$u(x)=\int_{a}^{b}G(x,t)f(t)dt$$ is the solution to the inhomogeneous equation $$Lu(x)=f(x)$$ $$\int_{a}^{b}LG(x,t)f(t)dt=\int_{a}^{b}\delta(x-t)f(t)dt=f(x)$$ $$L\underbrace{\int_{a}^{b}G(x,t)f(t)dt}_{u(x)}=f(x)$$

Appendix

Symmetric Groups

Permutations

A permutation is a bijection \[ \sigma: [n] \to [n] \] The set of all, $n!$ permutations of $[n]=\{1,2,\cdots,n\}$ is denoted by $S_n$.

A cycle is a permutation such that \[ \forall a\in [n], \forall b\in [n], \exists i\in \N : \sigma^n(a)=b \] We will be using the cycle notation $(x_1,\cdots,x_n)$ to denote the cycle $$\sigma(x_i)=x_{i+1}\quad i\ne n,\qquad \sigma(x_n)=x_1$$ A transposition is a cycle of length 2.

The identity permutation $$\sigma = [1,2,3,4,5,6,7]$$ or equivalently $$\sigma(i)=i$$
The cyclic permutation $$\sigma = [2,3,4,5,1]$$ or equivalently in cycle notation $$\sigma=(1,2,3,4,5)$$
A permutation with 2 cycles $$\sigma=(1,2,3,4)(5,6,7)$$ or equivalently in list notation $$\sigma = [2,3,4,1,6,7,5]$$
A permutation with a fixed point and transpositions $$\sigma=(1,3)(4,5)(6,7)(2)$$ or equivalently in list notation $$\sigma = [3,2,1,5,4,7,6]$$

Define the composition (product) of two permutations $\sigma$ and $\tau$ as \[ \sigma\tau(i)=\sigma(\tau(i)) \] If $\sigma$ can be written as a product of $m$ transpositions, then the sign of a permutation $\sigma$ is defined as \[ \operatorname{sgn}(\sigma)= (-1)^m \]

Every permutation $\sigma$ can be written as a product of disjoint cycles.
Every cycle can be written as a product of transpositions.
Every permutation $\sigma$ can be written as a product of transpositions.

The permutation $$\sigma = [2,3,4,1,6,7,5] = (1,2,3,4)(5,6,7) = \color{green}(1,4)(1,3)(1,2) \color{blue}(5,6)(6,7)$$ is an odd permutation.

Symmetric Groups

A group is a set $G$ with a binary operation $\cdot :G\to G$ such that

$\exists e \in G$ such that $$e\cdot x=x\cdot e=x\qquad \forall x\in G$$
$\forall x\in G$, $\exists x^{-1}\in G$ such that $$x\cdot x^{-1}=x^{-1}\cdot x=e$$

A group homomorphism is a function $f:G\to H$ such that

$$f(e)=e$$
$$f(x\cdot y)=f(x)\cdot f(y)$$

A group isomorphism is a bijective group homomorphism.

$S_n$ with the composition operation is a group , called the symmetric group $S_n$. Every group is isomorphic to a subgroup of $S_n$.

Algebraic Structures

Quotient Group, Rings, skew-fields , Fields,

Complex-ify & Real-ify

$$ a+bi \mapsto \begin{bmatrix} a & -b \\ b & a \end{bmatrix} $$ $$ a+bi+cj+dk \mapsto \left[\begin{array}{cc} a+b i & c+d i \\ -c+d i & a-b i \end{array}\right] \mapsto \left[\begin{array}{cccc} a & -b & -c & -d \\ b & a & -d & c \\ c & d & a & -b \\ d & -c & b & a \end{array}\right]= a\left[\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right]+b\left[\begin{array}{cccc} 0 & -1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{array}\right] +c\left[\begin{array}{cccc} 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \end{array}\right]+d\left[\begin{array}{cccc} 0 & 0 & 0 & -1 \\ 0 & 0 & -1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array}\right] $$

Polynomials

Every polynomial $p(x)$ of degree $n$ with complex coefficients has $n$ complex roots. $$ \exists \alpha_1,\dots,\alpha_n\in\mathbb{C} : p(x)=\prod_{i=1}^n(x-\alpha_i)$$

Matrix Calculus

Limits

Gelfand's formula: Given any matrix norm $\|\cdot\|$, $$ \rho(A):= \max\left(\{|\lambda_i| : \lambda_i \text{ an eigenvalue of }A \} \right) =\lim _{k=\infty}\left\|A^k\right\|^{1 / k} $$

Derivative

$$\nabla \left(\mathbf{x}^{T} \mathbf{A} \mathbf{x}\right)=( \mathbf{A}+ \mathbf{A}^{T} )\mathbf{x}$$ $$ \nabla x^TA^TAx = 2A^TAx$$ \begin{aligned} \nabla x^TA^TAx &= \left[\lim_{\epsilon \to 0} \frac{(x+\epsilon e_i)^TA^TA(x+\epsilon e_i)-x^TA^TAx}{\epsilon}\right] \\ &= \left[\lim_{\epsilon \to 0} \frac{x^TA^TAx+2\epsilon x^TA^TAe_i+\epsilon^2 e_i^TA^TAe_i-x^TA^TAx}{\epsilon} \right]\\ &=\left[ \lim_{\epsilon \to 0} \frac{2\epsilon x^TX^TXe_i+\epsilon^2 e_i^TA^TAe_i}{\epsilon}\right] \\ &= 2A^TAx \end{aligned}

Differential Equations

General Form

A first-order matrix differential equation typically has the form:

$$\mathbf{X}'(t) = A(t)\mathbf{X}(t),$$ where:

$\mathbf{X}(t)$ is an $n \times n$ matrix of unknown functions.
$A(t)$ is an $n \times n$ matrix of coefficients (possibly time-dependent).
$\mathbf{X}'(t)$ represents the derivative of $\mathbf{X}(t)$ with respect to $t$.

Constant Coefficient Matrices

For constant coefficient matrices ($A(t) = A$), the solution can be expressed using the matrix exponential:

$$\mathbf{X}(t) = e^{At}\mathbf{X}_0,$$ where:

$e^{At}$ is the matrix exponential, defined as:

$$e^{At} = \sum_{k=0}^{\infty} \frac{(At)^k}{k!},$$

$\mathbf{X}_0$ is the initial condition ($\mathbf{X}(0) = \mathbf{X}_0$).