Appendix A. MATHEMATICAL PRELIMINARIES¶

A.1. Linear Algebra¶

Vector notation and definitions¶

An $n$-dimensional vector $\V{x}$ is a tuple of real numbers $\V{x}=(x_1,\ldots,x_n) \in \Reals^n$. In later portions of this course, we will usually drop the boldface.

Vectors can be added and subtracted component-wise, can be multipled by elements of $\Reals$, and can be divided by elements of $\Reals \setminus \{0\}$. There is a zero vector $\V{0}$ with all elements equal to zero. Each element has a component-wise negation, $-\V{x}$. We will also on occasion refer to the standard basis vectors $\V{e_1},\ldots,\V{e_n}$, where $\V{e_i}$ has all elements equal to 0 except for the $i$'th element, which is equal to 1. More commonly, we refer to $\V{e_1}$ as the x-axis, $\V{e_2}$ as the y-axis, and so on.

Dot product. The dot product is a function that takes two vectors $\V{x}=(x_1,\ldots,x_n)$ and $\V{y}=(y_1,\ldots,y_n)$ and returns a real number, and is given by the expression $$\V{x} \cdot \V{y} = \sum_{i=1}^n x_i y_i.$$

For example, $\V{x} \cdot \V{e_i} = x_i$.

Orthogonal vectors. Two vectors whose dot product is identically zero are called orthogonal.

For example, $\V{e_i} \cdot \V{e_j} = 0$ for any pair $i\neq j$, and so $\V{e_i}$ and $\V{e_j}$ are orthogonal. Also, $ \begin{bmatrix}1 \\ 2 \end{bmatrix} \cdot \begin{bmatrix} -6 \\ 3 \end{bmatrix} = 0$, so these vectors are orthogonal as well.

Norms. A norm describes some notion of vector magnitude. The standard Euclidean norm $\|\cdot\|$ is defined as $\|\V{x}\|=\sqrt{\V{x} \cdot \V{x}}$.

Some identities are $\|\V{0}\|=0$, $\|\V{x}\| > 0$ if $\V{x} \neq \V{0}$, and $\|c\V{x}\| = |c|\,\|\V{x}\|$ for all real values $c$.

Unit vectors. A unit vector is a vector with unit norm: $\|\V{x}\|=1$.

Euclidean distance. The euclidean distance between two vectors is given by the norm of the difference between the vectors: $d(\V{x},\V{y}) = \|\V{x}-\V{y}\|$.

It satisfies all the criteria of a metric, and hence vector spaces are metric spaces.

Cosine angle formula. The cosine of the angle between two unit vectors is equal to their dot product, i.e. $\cos(\theta) = \V{a} \cdot \V{b}$.

Linear Combinations. $\V{x}$ is a linear combination of a set of vectors $\{\V{a_1},\ldots,\V{a_m}\}$ if $$\V{x} = \sum_{i=1}^m u_i \V{a_i}$$ for some set of numbers $u_1,\ldots,u_m$.

Matrices¶

A matrix $A$ represents a linear transformation of an $n$-dimensional vector space to an $m$-dimensional one. It is given by an $m\times n$ array of real numbers. Usually matrices are denoted as uppercase letters (e.g., $A, B, C$), with the entry in the $i$'th row and $j$'th column denoted in the subscript $\cdot_{i,j}$, or when it is unambiguous, $\cdot_{ij}$ (e.g., $A_{1,2}, A_{1p}$). $$A = \left[ \begin{array}{ccc} A_{1,1} & \cdots & A_{1,n} \\ \vdots & & \vdots \\ A_{m,n} & \cdots & A_{m,n} \end{array}\right]$$

Matrix-Vector Product¶

An $m\times n$ matrix $A$ transforms vectors $\V{x} =(x_1,\ldots,x_n)$ into $m$-dimensional vectors $\V{y} = (y_1,\ldots,y_m) = A\V{x}$ as follows: $$ \begin{split} y_1 = \sum_{j=1}^n A_{1j} x_j \\ \ldots \\ y_m = \sum_{j=1}^n A_{mj} x_j \\ \end{split}$$ Or, more concisely, $y_i = \sum_{j=1}^n A_{ij} x_j$ for $i=1,\ldots,m$. (Note that matrix-vector multiplication is not symmetric, so $\V{x} A$ is an invalid operation.)

Linearity of matrix-vector multiplication¶

We can see that matrix-vector multiplication is linear, that is $A(a\V{x}+b\V{y}) = a A \V{x} + b A \V{y}$ for all $a$, $b$, $\V{x}$, and $\V{y}$. It is also linear in terms of component-wise addition and multiplication of matrices, as long as the matrices are of the same size. More precisely, if $A$ and $B$ are both $m\times n$ matrices, then $(a A + b B)\V{x} = a A\V{x} + b B\V{x}$ for all $a$, $b$, and $\V{x}$.

Identity matrix¶

One special matrix that occurs frequently is the $n\times n$ identity matrix $I_n$, which has 0's in all off-diagonal positions $I_{ij}$ with $i\neq j$, and 1's in all diagonal positions $I_{ii}$. It is significant because $I_n \V{x} = \V{x}$ for all $\V{x} \in \Reals^n$.

Matrix Product¶

When two linear transformations are performed one after the other, the result is also a linear transformation. Suppose $A$ is $m\times n$, $B$ is $n \times p$, and $\V{x}$ is a $p$-dimensional vector, and consider the result of $A(B\V{x})$ (that is, first multiplying by $B$ and then multiplying the result by $A$). We see that $$B\V{x} = (\sum_{j=1}^p B_{1j} x_j,\ldots,\sum_{j=1}^p B_{nj} x_j)$$ and $$A \V{y} = (\sum_{k=1}^n A_{1k} y_k,\ldots,\sum_{k=1}^n A_{mk} y_k)$$ So $$A (B \V{x}) = \left(\sum_{k=1}^n A_{1k} (\sum_{j=1}^p B_{kj} x_j),\ldots,\sum_{k=1}^n A_{mk} (\sum_{j=1}^p B_{kj} x_j)\right).$$ Rearranging the summations, we see that $$A (B \V{x}) = \left(\sum_{j=1}^p (\sum_{k=1}^n A_{1k} B_{kj}) x_j),\ldots,\sum_{j=1}^p (\sum_{k=1}^n A_{mk} B_{kj} x_j)\right).$$ In other words, we could have $A(B\V{x}) = C \V{x}$ if we were to form a matrix $C$ such that $$C_{ij} = \sum_{k=1}^n A_{ik} B_{kj}$$ This is exactly the definition of the matrix product, and we say $C=AB$. The entry $C_{ij}$ of can also be obtained taking the dot-product of the $i$'th column of $A$ and the $j$'th column of $B$.

Matrix product is associative but not symmetric¶

By the above derivation we can drop the parentheses $A(B\V{x}) = (AB)\V{x}$. So, matrix-vector and matrix-matrix multiplication are associative. Note again however that matrix-matrix multiplication is not symmetric, that is $AB \neq BA$ in general.

Column and row vectors¶

Note that if we were to write an $n$-dimensional vector $\V{x}$ stacked in a $n\times 1$ matrix $x$ (denoted in lowercase), we can turn the matrix-vector $\V{y}=A\V{x}$ into the matrix product $y = A x$. Here, if $A$ is an $m\times n$ matrix, then $y$ is an $m\times 1$ matrix. $$\left[ \begin{array}{c} y_1 \\ \vdots \\ y_m \end{array}\right] = \left[ \begin{array}{ccc} A_{1,1} & \cdots & A_{1,n} \\ \vdots & & \vdots \\ A_{m,n} & \cdots & A_{m,n} \end{array}\right] \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \end{array}\right]$$ Hence, there is a one-to-one correspondence between vectors and matrices with one column. These matrices are called column vectors and will be our default notation for vectors throughout the rest of the course. We will occasionally also deal with row vectors, which are matrices with a single row.

Transpose¶

The transpose $A^T$ of a matrix $A$ simply switches $A$'s rows and columns. $$(A^T)_{ij} = A_{ji}.$$ If $A$ is $m \times n$, then $A^T$ is $n \times m$.

Symmetric matrix. A square matrix $A$ is symmetric iff $A = A^T$.

Matrix Inverse¶

An inverse $A^{-1}$ of an $n\times n$ square matrix $A$ is a matrix that satisfies the following equation: $$A A^{-1} = A^{-1} A = I_n$$ where $I_n$ is the identity matrix. Not all square matrices have an inverse, in which case we say $A$ is not invertible (or singular). Invertible matrices are significant because the unique solution $x$ to the system of linear equations $Ax = b$, is simply $A^{-1} b$. This holds for any $b$. If the matrix is not invertible, then such an equation may or may not have a solution.

Orthogonal matrix. An orthogonal matrix is a square matrix that satisfies $A A^T = I_n$. In other words, its transpose is its inverse.

Matrix identities¶

Identities involving the transpose:

$(cA)^T = c A^T$ for any real value $c$.
$(A+B)^T = A^T + B^T$.
$(AB)^T = B^T A^T$.
All $1\times 1$ matrices are symmetric, the identity matrix is symmetric, and all uniform scalings of a symmetric matrix are symmetric.
$A + A^T$ is symmetric.
The dot product $\V{x} \cdot \V{y}$ is equal to $x^T y$, with $x$ and $y$ denoting the column vector representations of $\V{x}$ and $\V{y}$, respectively.
$x^T A y = y^T A^T x$, with $x$ and $y$ column vectors.

Identities involving the inverse:

$I_n^{-1} = I_n$.
$(cA)^{-1} = \frac{1}{c} A^{-1}$ for any real value $c\neq 0$.
$(AB)^{-1} = B^{-1}A^{-1}$ if both $B$ and $A$ are invertible.
If $A$ and $B$ are invertible, then $(ABA^{-1})^{-1} = A B^{-1}A^{-1}$.

Matrix Pseudoinverse¶

The pseudoinverse is a generalization of the inverse of an $m \times n$ matrix $A$ that is used when an inverse does not exist. It can also be used when a matrix is not square. The pseudoinverse $A^+$ is defined as an $n \times m$ matrix that has the following properties:

$A A^+ A = A$
$A^+ A A^+ = A^+$
$(A A^+)^T = A A^+$
$(A^+ A)^T = A^+ A$

It has the following properties:

If $A$ is invertible, then $A^+ = A^{-1}$.
If multiple solutions to $A x = b$ exist, then $x = A^+ b$ is the solution that minimizes $\| x \|$.
If no solutions to $A x = b$ exist, then $x = A^+ b$ is the solution that minimizes $\| A x - b \|$ (least squares solution).

The pseudoinverse is usually available in most major linear algebra systems. It can be computed using the singular value decomposition (SVD), which is itself one of the most useful tools in scientific computing (see Appendix B.2 ).

Positive definiteness¶

An $n\times n$ matrix $A$ is called symmetric positive definite (s.p.d.), or just positive definite, if it is symmetric ($A=A^T$) and satisfies the following condition: $$\V{x}^T A \V{x} > 0 \text{ for all }\V{x}\in \mathbb{R}^n.$$ For example, an identity matrix is s.p.d., as is any matrix $A = B^T B$ with $B$ a matrix with rank $n$. An s.p.d. matrix is invertible.

A matrix is positive semi-definite (p.s.d.) if the strict positivity condition is replaced with a nonnegativity condition: $$\V{x}^T A \V{x} \geq 0 \text{ for all }\V{x}\in \mathbb{R}^n.$$ Any s.p.d. matrix is also p.s.d., and any matrix $A=B^TB$ is also p.s.d.