How to Matrix: A Simple Quick Reference

Matrix Quick Reference

Table of Contents



Introduction

This a basic reference for the fundamentals of vectors and matrices. What is their structure? What operations can be performed on them? What are their special properties, etc.

This is intended to be a quick lookup resources for how to read and manipulate matrix expressions and formulas. This is NOT a primer on linear algebra, which is a much more expansive and complex topic.


Note on Representation / Notation

Matrices / vectors have a wide variety of ways to represent them symbolically. This source use lower case letters for vectors (\(v\)) and capital for matrices (\(M\)) while others will capitalize vectors as well (\(V\)). Some use arrows (\(\overrightarrow{v}\)) for vectors and some incorporate boldface (which is dumb). Sadly, there is no set standard and there are many different common notations so you need to be very careful when using multiple sources as the same symbols can be used very differently.


Scalar vs Vector vs Matrix

\[\large
\begin{align}
c = \stackrel{\text{Scalar}\\}{5} &&
v = \stackrel{\text{Vector}\\}{\begin{bmatrix} 9 \\ -2 \\ 2 \end{bmatrix}} &&
X = \stackrel{\text{Matrix}\\}{
\begin{bmatrix} 0 & 6 \\
1 & 8
\end{bmatrix}
}
\end{align}
\]


Scalar

Scalars are 0-dimensional values.
\[\large
\begin{align}
c = 5 && (\normalsize \text{scalar})
\end{align}
\]


Vector

Vectors are 1-dimensional arrays of values. Vectors come in two flavors: column and row. An \(n\)-dimensional column vector will have 1 “column” containing \(n\) elements; an \(n\)-dimensional row vector will have 1 “row” containing \(n\) elements.
\[\large
\begin{align}
a = \begin{bmatrix} 1\\2\\3 \end{bmatrix} && b = \begin{bmatrix} 4&5&6 \end{bmatrix} && \normalsize(\text{column and row vector})
\end{align}
\]
If the type of the vector is not specified or it does not matter for the problem or operation at hand, assume column vector.


Subdividing Vectors

This is NOT an orthodox math representation, but is common in programming.
\[\large
\begin{align}
&v = \begin{bmatrix} a \\ b \\ c \end{bmatrix} && & &v_2 = \begin{bmatrix} b \end{bmatrix} \\ \\
&w = \begin{bmatrix} a & b & c \end{bmatrix} && & &w_3 = \begin{bmatrix} c \end{bmatrix} \\ \\
&x = \begin{bmatrix} a \\ b \\ c \end{bmatrix} && & &w_{1:2} = \begin{bmatrix} a \\ b \end{bmatrix}
\end{align}
\]


Matrix

Matrices are 2-dimensional rectangular arrays of values. An \(r\times c\) matrix will have \(r\) rows and \(c\) columns.

Vectors and scalars can be generalized as matrices i.e. a scalar can be treated as a \(1\times1\) matrix and a vector can be treated as a \(n\times 1\) or \(1\times n\) matrix.

\[\large
\begin{align}
A = \stackrel{3\times 2}{\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}} &&
B = \stackrel{2\times 2}{\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}} &&
C = \stackrel{2\times 3}{\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}} && \normalsize (\text{matrices})
\end{align}
\]


Subdividing Matrices

This is NOT an standard math representation, but it is common in programming.
\[\large
\begin{align}
&V = \begin{bmatrix} a & b & c \\
d & e & f \\
g & h & i \end{bmatrix} && & &V_{[2,]} = \begin{bmatrix} d & e & f \end{bmatrix} \\ \\
&V = \begin{bmatrix} a & b & c \\
d & e & f \\
g & h & i \end{bmatrix} && & &V_{[,3]} = \begin{bmatrix} c \\ f \\ i \end{bmatrix} \\ \\
&V = \begin{bmatrix} a & b & c \\
d & e & f \\
g & h & i \end{bmatrix} && & &V_{[2,1]} = \begin{bmatrix} d \end{bmatrix} \\ \\
&V = \begin{bmatrix} a & b & c \\
d & e & f \\
g & h & i \end{bmatrix} && & &V_{[3,2:3]} = \begin{bmatrix} h & i \end{bmatrix}
\end{align}
\]


Functions


Addition

Two matrices or vectors of exactly the same dimensions can be combined through addition. The value in one matrix is added to the value in its corresponding position in the other.
\[\large
\begin{align}
\stackrel{r\times c}{A} + \stackrel{r\times c}{B} &= \stackrel{r\times c}{C} \\ \\
\stackrel{a}{\begin{bmatrix}
8 \\ 3 \\ 9
\end{bmatrix}}
+
\stackrel{b}{\begin{bmatrix}
0 \\ 4 \\ 9
\end{bmatrix}}
&=
\stackrel{c}{\begin{bmatrix}
8+0 \\3+4 \\ 9+9
\end{bmatrix}} \\ \\
\stackrel{A}{\begin{bmatrix}
3 & 3 & 0 \\
4 & 2 & 9
\end{bmatrix}}
+
\stackrel{B}{\begin{bmatrix}
3 & 9 & 5 \\
1 & 6 & 1
\end{bmatrix}}
&=
\stackrel{C}{\begin{bmatrix}
3+3 & 3+9 & 0+5 \\
4+1 & 2+6 & 9+1
\end{bmatrix}} \\

\end{align}
\]


Scalar Multiplication

Multiplying a vector or matrix by a scalar (sometimes called “scaling”) simply multiplies every element by the scalar.
\[\large
\begin{align}
X = \begin{bmatrix}
1 & 2 \\
3 & -5
\end{bmatrix} &&
5X = \begin{bmatrix}
5* 1 & 5* 2 \\
5* 3 &5*-5
\end{bmatrix}
\end{align}
\]


Vector Dot Product

Two vectors of the same dimensions can be inputs to a dot product. The output of the dot product is a scalar. The \(i\)th element of each vector is multiplied together and those products are summed.

The vector dot product is unaffected by whether its inputs are any combination of row/column vectors. It is only required that the dimensions (eg. the number of elements in the vectors) are equal between the two.
\[\large
\begin{align}
\stackrel{x}{\begin{bmatrix}7 \\ 0 \\ 1\end{bmatrix}} \cdot \stackrel{y}{\begin{bmatrix}9 \\ 7 \\ 6\end{bmatrix}} &= 7*9 + 0*7 + 1*6 = \stackrel{z}{69}
\end{align}
\]


Sum of Squares

You can represent the sum of squared values as the dot product of two identical vectors.
\[\large
\begin{align}
\sum_{i=1}^n x_i^2 &= x\cdot x \\
\stackrel{x}{\begin{bmatrix}1\\2\\4\end{bmatrix}} \cdot \stackrel{x}{\begin{bmatrix}1\\2\\4\end{bmatrix}} &= 1^2 + 2^2 + 4^2 =14
\end{align}
\]


Matrix Multiplication

matrix multiplication
Matrix Multiplication is “Row to Column”

The diagram above illustrates the matrix multiplication operation. Notice the first row of the left matrix and the first column of the right matrix highlighted in red. They are both vectors with equal dimensions and their dot product forms the 1st row 1st column value of the product matrix.

The dot product of the left’s 3rd row vector and right’s 2nd column vector, highlighted in blue, form the 3rd row 2nd column value of the product. In practice, the matrices are displayed side by side as below.
\[\large
\begin{align}
\stackrel{X}{
\begin{bmatrix}
1&2\\
3&4\\
5&6
\end{bmatrix}
}
\stackrel{Y}{
\begin{bmatrix}
7&9\\
8&0
\end{bmatrix}
} &=
\stackrel{Z}{
\begin{bmatrix}
1*7+2*8 & 1*9 + 2*0 \\
3*7+4*8 & 3*9+4*0 \\
5*7+6*8 & 5*9+6*0
\end{bmatrix}
}
\end{align}
\]
In order for two matrices to be multiplied the column dimension on the left must be equal to the row dimension on the right. Therefore, matrix multiplication does not follow the associative rule (ie. xy = yx). The order of the association matters as you can see above. Additionally, the dimensions of product matrix are determined by both the left and right matrix as shown below.
\[\large
\begin{align}
\stackrel{\Large\color{green}n\color{gray}\times \color{gold}q}{X} \times \stackrel{\Large\color{gold}q \color{gray} \times \color{blue}m}{Y} &= \stackrel{\Large\color{green}n\color{gray}\times \color{blue}m}{Z} && \normalsize (\text{left columns must match right rows})
\end{align}
\]


Summation of a Matrix or Vector

\[\large
\begin{align}
&X = \stackrel{n\times k}{\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}} && & &\sum X = \sum_{i=1}^n\sum_{j=1}^k X_{[i,j]} \\ \\
&v = \stackrel{1\times p}{\begin{bmatrix} a & b & c \end{bmatrix}} && & &\sum v = \sum_{i=1}^p v_i \\ \\
&v = \stackrel{p\times 1}{\begin{bmatrix} a \\ b \\ c \end{bmatrix}} && & &\sum v = \sum_{i=1}^p v_i
\end{align}
\]


Transposition

Transposing a vector or matrix essentially is swapping the row and column positions for all values. So for example, the 2nd row 1st column value in a matrix will move to the 1st row 2nd column value in the transposed matrix (\(X^\mathrm{T} = Y\longrightarrow X_{[1,2]}^\text{T} = Y_{[2,1]}\)).
\[\large
\begin{align}
v = \stackrel{3\times 1}{\begin{bmatrix} 2 \\ 8 \\ 0 \end{bmatrix}} &&
v^\mathrm{T} = \stackrel{1\times 3}{\begin{bmatrix} 2 & 8 & 0 \end{bmatrix}} \\
X = \stackrel{2\times 3}{\begin{bmatrix} 0 & 1 & 4 \\
2 & 5 & 3 \end{bmatrix}} &&
X^\mathrm{T} = \stackrel{3\times 2}{\begin{bmatrix} 0 & 2 \\
1 & 5 \\
4 & 3 \end{bmatrix}} \\
\end{align}
\]
When transposing the product of multiple matrices, reverse the order of the multiplication and toggle the transposition on every term.
\[\large
\begin{align}
\Bigg(
\stackrel{x}{\begin{bmatrix} 5 & 7
\end{bmatrix}}
\stackrel{Y}{\begin{bmatrix} 0 & 6 \\
1 & 8
\end{bmatrix}}
\Bigg)^\mathrm{T} &=
\stackrel{Y^\mathrm{T}}{\begin{bmatrix} 0 & 1 \\
6 & 8
\end{bmatrix}}
\stackrel{x^\mathrm{T}}{\begin{bmatrix} 5 \\ 7
\end{bmatrix}}
\\ \\
(AB^\mathrm{T}C + D)^\mathrm{T} &= C^\mathrm{T}BA^\mathrm{T} + D^\mathrm{T}
\end{align}
\]


Sum of Squares

You can represent the sum of squared values as matrix multiplication on a row vector and its column equivalent (or through a transposed column vector multiplied by the un-transposed version).
\[\large
\begin{align}
\sum_{i=1}^ny_i &= y^\mathrm{T}y \\
\stackrel{y^\mathrm{T}}{\begin{bmatrix}1 & 2 & 3\end{bmatrix}}\stackrel{y}{\begin{bmatrix}1 \\ 2 \\ 3\end{bmatrix}} &= 1^2+2^2+3^2 = 14
\end{align}
\]


Symmetrical Matrix

A matrix that is its own transposition is a symmetrical matrix
\[\large
\begin{align}
A^\mathrm{T} &= A && \normalsize (\text{symmetrical matrix}) \\ \\
\begin{bmatrix}
7&6 \\
6&10
\end{bmatrix}^\mathrm{T} &=
\begin{bmatrix}
7&6 \\
6&10
\end{bmatrix}
\end{align}
\]


Rank

The rank of a matrix is the number of linearly independent row/column vectors. A common method of finding the rank, and a good illustration of what rank is, can be shown by simplifying a matrix (\(A\rightarrow A^{*}\)) using the 3 elementary matrix operations.

  1. Scaling: Multiplying a row or column vector by a scalar
  2. Combining: Adding one vector to another
  3. Switching: Switching two vectors position in the matrix
    *Performing these operations on a matrix does NOT change its rank

If the above set of operations can be used on any set of row or column vectors to produce a 0 vector, then that set is linearly dependent.
\[\large
\begin{align}
&A = \begin{bmatrix}
2&1\\
1&2\\
7&-5
\end{bmatrix} \rightarrow
\stackrel{R_3:+2R_2+R_1 }{\begin{bmatrix}
2&1\\
1&2\\
11&0
\end{bmatrix}} \rightarrow
\stackrel{R_2:-R_3/11}{\begin{bmatrix}
2&1\\
0&2\\
11&0
\end{bmatrix}} \rightarrow
\stackrel{R_1:-R_3/3.5-R_2/2}{\begin{bmatrix}
0&0\\
0&2\\
11&0
\end{bmatrix}} = A^{*} \\ \\
& \text{rank}(A) = \text{rank}(A^{*}) = 2
\end{align}
\]
The set of 3 row vectors from \(A\) is linearly dependent, however the set of any 2 out of the 3 is independent, as evidenced by the reduced form. Below are some other important properties of or controlled by rank.

  • Transposing a matrix does not change its rank, therefore the row and column rank of a matrix cannot be different and the maximum rank of the matrix is the smaller of the row/column dimensions.
  • A matrix, which has its maximal rank is said to be full rank.
  • For a square matrix with less than full rank, no inverse exists.
  • The maximum rank of the product of two matrices is the smaller of the two factors.


Inverse

Any square matrix of full rank has an inverse. When the original matrix and the inverse are multiplied together the product is a special matrix called the identity matrix. This is a generalization of scalar inverses to matrices.
\[\large
\begin{align}
& A= 6 && B = \begin{bmatrix} 4 & 7 \\ 2 & 6 \end{bmatrix} \\ \\
&A^{-1} = \frac{1}{6} && B^{-1} = \begin{bmatrix} 0.6 & -0.7 \\ -0.2 & 0.4 \end{bmatrix} \\ \\
&AA^{-1} = A^{-1}A = I && BB^{-1}=B^{-1}B = I \\ \\
&6 \times \frac{1}{6} = 1 && \begin{bmatrix} 4 & 7 \\ 2 & 6 \end{bmatrix}\begin{bmatrix} 0.6 & -0.7 \\ -0.2 & 0.4 \end{bmatrix}=\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
\end{align}
\]
For a matrix which is the product of square factors, its inverse can be represented as the reverse order and toggled inverse of the original.
\[\large
\begin{align}
X &= ABC^{-1} + D && & X^{-1} &= CB^{-1}A^{-1} + D^{-1} && \normalsize (A,B,C\text{ are all square}) \\ \\
X &= A^{-1}\stackrel{3\times 2\cdot 2\times 3}{BC} + D && & X^{-1} &= (BC)^{-1}A + D^{-1} &&\normalsize (B\text{ and }C\text{ are not square, but }BC\text{ is})
\end{align}
\]

Identity Matrix

The identity matrix, \(I\), is a square matrix containing all 0’s accept for the main diagonal, which contains all 1’s.
\[\large
I = \begin{bmatrix}
1 & 0 & \ldots & 0 \\
0 & 1 & \ldots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & 1
\end{bmatrix}
\]
The identity matrix is a generalization of the scalar multiplicative identity (i.e. anything multiplied by the identity matrix is itself).
\[
\begin{align}
& 5 * 1 = 5 && \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 7 & 9 & 2 \\ 1 & 9 & 4 \\ 4 & 3 & 8 \end{bmatrix} = \begin{bmatrix} 7 & 9 & 2 \\ 1 & 9 & 4 \\ 4 & 3 & 8 \end{bmatrix} & \\ \\
& \begin{bmatrix}0 & 6 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 0 & 6 \end{bmatrix} && \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 0 \\ 6 \end{bmatrix} = \begin{bmatrix} 0 \\ 6 \end{bmatrix} & \\ \\
& Iv = v && AIB = AB &
\end{align}
\]


Orthogonal and Semi-Orthogonal Matrices

NOTE: Commonly texts will use the term orthogonal when talking about matrices that are only semi-orthogonal. Basically, if you see the term “orthogonal matrix” you can assume it is at least one way orthogonal and possibly fully orthogonal.

Orthogonal matrices are square matrices that when multiplied by their transposition or their transposition by themselves produce the identity matrix.
\[\large
\begin{align}
A = \begin{bmatrix}
1/3 & -2/3 & 2/3 \\
2/3 & -2/3 & -2/3 \\
2/3 & 2/3 & 1/3
\end{bmatrix} && A^\mathrm{T}A &= AA^\mathrm{T} = I && \normalsize (\text{orthogonal matrix})
\end{align}
\]
Semi-orthogonal matrices are rectangular and also referred to as row or column (which ever is longer) orthogonal. These matrices produce the identity matrix when multiplied by their transposition OR vice versa, BUT NOT BOTH WAYS.
\[\large
\begin{align}
B & = \begin{bmatrix}
1/3 & -2/3 \\
2/3 & -2/3 \\
2/3 & 2/3
\end{bmatrix} && & B^\mathrm{T}B &= I && \normalsize (\text{columns are orthogonal}) \\
C & = \begin{bmatrix}
1/3 & 2/3 & 2/3 \\
-2/3 & -2/3 & 2/3
\end{bmatrix} && & CC^\mathrm{T} &= I && \normalsize (\text{rows are orthogonal})
\end{align}
\]


Diagonal

Diagnoal is a function on a vector that produces a diagonal matrix and also a function on a matrix that produces the vector of its main diagonal.
\[\large
\begin{align}
a &= \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} && & \text{diag}(a) &= \begin{bmatrix} 1&0&0 \\ 0&2&0 \\ 0&0&3 \end{bmatrix} \\ \\
B &= \begin{bmatrix} 1&2&3 \\ 4&5&6 \\ 7&8&9 \end{bmatrix} && & \text{diag}(B) &= \begin{bmatrix} 1 \\ 5 \\ 9 \end{bmatrix} \\ \\
C &= \begin{bmatrix} 1&2&3 \\ 4&5&6 \end{bmatrix} && & \text{diag}(C) &= \begin{bmatrix} 1 \\ 5 \end{bmatrix}
\end{align}
\]


QR Decomposition

Literally every matrix can be factored into two matrices denoted, \(Q\) and \(R\). These matrices have properties that can be used to reduce matrix expressions into forms more efficiently computed than the non-factored form.

You do not need to know how to calculate the factorization to use its properties to simplify some matrix expressions (eg. \((X^\mathrm{T}X)^{-1}X^\mathrm{T}y\) where \(n \ge m\) can be reduced to \(R_X^{-1}Q_X^\mathrm{T}y\), which is less computationally intensive, because \(R_X^{-1}\) has a known number of zeros in it).

  • Let \(A\) be an \(n\times m\) matrix. \(A\) can be factored into \(A = Q_AR_A\).
  • The \(Q\) factor is an orthogonal matrix.
  • The \(Q\) factor has the same dimensions as A when \(n > m\).
  • The \(Q\) factor is square when \(n\le m\).
  • The \(R\) factor is an upper triangle matrix (ie. all values below its main diagonal are 0).
  • The \(R\) factor is square when \(n \ge m\).
  • The \(R\) factor has the same dimensions as A when \(n < m\).
  • The \(R\) factor has the same rank as \(A\).


Element-wise Operations

In programming contexts, it is possible to use functions that operate on the elements of a matrix individually. These are almost never presented in mathematics contexts, although they can be useful there. For example if you want to represent that a vector of standard deviations can be extracted from covariance matrix.

In order to represent an element operation I surround the function with brackets as seen below. This means the function is to operate on each element of the matrix individually.

\[\large{
\begin{align}
K = \begin{bmatrix} \sigma_X^2 & \sigma_{XY} \\ \sigma_{XY} \end{bmatrix} && [\sqrt{\text{diag}(K)}]
\end{align}
}\]
Element-wise operations can also be performed multiplication between two matrices of exactly the same dimensions produces a matrix of those same dimensions. In the example below the elements in the corresponding positions in the two matrices are multiply together and the result is placed in that same position in the product matrix.
\[\large
\begin{align}
X = \begin{bmatrix} 10 & 20 & 30 \\
40 & 50 & 60 \\
70 & 80 & 90
\end{bmatrix} && Y =\begin{bmatrix} 1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix} && [X \times Y] = \begin{bmatrix} 10 \times 1 & 20 \times 2 & 30 \times 3 \\
40 \times 4 & 50 \times 5 & 60 \times 6 \\
70 \times 7 & 80 \times 8 & 90 \times 9
\end{bmatrix}
\end{align}
\]

# define matrices
X <- rbind(
   c(10,20,30)
  ,c(40,50,60)
  ,c(70,80,90)
)

Y <- rbind(
   c(1,2,3)
  ,c(4,5,6)
  ,c(7,8,9)
)

# matrix multiplication
X %*% Y

# "element-wise" multiplication
X * Y


A Note on Phraseology

It can be unclear when reading exactly what operation is being referenced so be careful. For example look at the various ways in which “cubed” and “squared” can be interpreted when when discussing matrices.

X <- cbind(c(1,2),c(2,5))

# elements of a matrix raised to power 3 i.e. "cubed"
X^3

# square matrix raised to power of 3 i.e. "cubed"
X %*% X %*% X


W <- cbind(c(1,2,3),c(4,5,6))
Z <- rbind(c(7,8,9),c(10,11,12))

# elements of a matrix raised to power 2 i.e. "squared"
W^2
Z^2

# matrix "squared" by multiplying by its transposition
W %*% t(W)
Z %*% t(Z)

# "square root" of every element in a matrix
sqrt(X)

# X is the "square root" of V, V = X %*% X, V %*% X^-1 = X
X
V <- X %*% X
V %*% solve(X)

# The identity matrix has an infinite number of "square roots."
# Every square orthogonal matrix is a "square root" of I since 
# multiplying an orthogonal matrix by itself equals I.

Leave a Comment

Your email address will not be published. Required fields are marked *