Introduction

Humanity has always tackled challenges by breaking them down into smaller pieces. We have various tasks to achieve one unique goal, we have various steps to do one specific recipe, and I am sure you are familiar with the term divide and conquer. Look at this example. We know that \(18\) can be decomposed into \(2\) and \(3\) such that \(18 = 2 \times 3 \times 3\). Then, from that knowledge, we can infer that \(18\) and all its multiples are divisible by \(2\). As we can see, decomposition can be incredibly useful for discovering properties previously invisible if the problem was considered in its entirety.

The technique we will discuss in this post is all about decomposition but with matrices, and it is called Singular Value Decomposition. This blog post is the second part of a series around SVD.

If you are not familiar with terms such Vector Spaces and Orthogonality, definitely check the first part to make the reading of this one smooth.

Singular Value Decomposition

Singular Value Decomposition (SVD for short) is method used to factorize any rectangular matrix \(\boldsymbol{M} \in \mathbb{R}^{m \times n}\). Why is useful? Let’s take a look at an example. Consider the function below:

$$ f(x) = x^2 + 2x + 1 $$

Assuming you did your maths well in high school, you would find little to no problem in confirming that the following is true:

$$ f(x) = x^2 + 2x + 1 = (x + 1)(x + 1) $$

Therefore, we see that this mildly complex function becomes clear when it is decomposed right? It just takes a value \(x\), add one to it and square the result. From the last post, we said that as scalar functions are transformations that map between scalar values and vector transformations are transformation of vectors. This means that, if we can decomposes functions such that we discover interesting properties in them, the same also hold for vector transformations. Additionally, we said that some linear vector transformations could represented as matrices. As a result, a matrix decomposition technique like SVD could be very helpful to understand how vectors are transformed.

Now, let’s dive into the subject matter. Single Value Decomposition is simply a method use to decompose any rectangular matrix \(\boldsymbol{M} \in \mathbb{R}^{m \times n}\) such that:

$$ \boldsymbol{M} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{T} $$

Therefore, \(\boldsymbol{M}\) can be represented as the product of three matrices namely, two orthogonal square matrices, \(\boldsymbol{U} \in \mathbb{R}^{m \times m}\) and \(\boldsymbol{V} \in \mathbb{R}^{n \times n}\) as well as a a diagonal matrix \(\boldsymbol{\Sigma} \in \mathbb{R}^{m \times n}\) with positive entries only.

singular value decomposition
Fig. 1. Single value decomposition.

Each matrix \(\boldsymbol{U}\), \(\boldsymbol{V}\) and \(\boldsymbol{\Sigma}\) has a property that characterizes SVD in a way that make matrix operation much more intuitive than if we soley focused on the original matrix. We will look at each of them by considering an example of a vector \(\boldsymbol{v}\) belonging to the vector space \(\mathbb{R}^n\), transformed by \(\boldsymbol{M}\) through a matrix multiplication on that vector. We know that the result of this operation will be a vector in the space \(\mathbb{R}^m\) but what is happening under the hood is a bit obscure to us. As surprising as it might seem, it turns out SVD is great framework for dissecting the operation and unveiling its mysteries.

We will first that by representing the vector transformation of \(\boldsymbol{M}\) in terms of its decomposition. This is means that:

$$ \boldsymbol{M}\boldsymbol{v} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{T}\boldsymbol{v} $$

and the transformation above follows an order such that:

$$ \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{T}\boldsymbol{v} = (\boldsymbol{U} (\boldsymbol{\Sigma} (\boldsymbol{V}^{T}\boldsymbol{v}))) $$

Let us start by analyzing what happens in the inner most bracket.

\(\boldsymbol{V}^{T}\boldsymbol{v}\)

The first step is to multiply the vector with the matrix \(\boldsymbol{V}^{T}\). As you may recall, the matrix \(\boldsymbol{V}\) is orthogonal. Therefore, it follows that its transpose is all orthogonal. Something interesting with Othorgonal matrices is that they preserve the length of a vector when multiplied to it. To prove this we will calculate the length of the vector resulting from the operation in the inner most bracket. Therefore, we have that:

$$ \begin{align} \lVert \boldsymbol{V}^{T}\boldsymbol{v} \rVert &= (\boldsymbol{V}^{T}\boldsymbol{v})^T(\boldsymbol{V}^{T}\boldsymbol{v}) \\ &= \boldsymbol{v}^T\boldsymbol{V}\boldsymbol{V}^{T}\boldsymbol{v} \end{align} $$

but \(\boldsymbol{V}\boldsymbol{V}^{T} = \boldsymbol{I}\) since \(\boldsymbol{V}\) is orthogonal (see proof). Therefore,

$$ \begin{align} \lVert \boldsymbol{V}^{T}\boldsymbol{v} \rVert = \boldsymbol{v}^T\boldsymbol{v} = \lVert \boldsymbol{v} \rVert \end{align} $$

As we can see, we can not alter the length of the vector \(\boldsymbol{v}\) for any orthogonal \(\boldsymbol{V}\) or \(\boldsymbol{V}^T\). Therefore, this means that this operation can only flip or rotate the vector \(\boldsymbol{v}\) as follows:

vector rotation
Fig. 2. Vector rotation by the orthogonal matrix.

This example shows a vector with coordinates in the \(x\)-\(y\) plan being rotated counter-clockwise. The orientation of the original vector is show in orange and the rotate vector is show in blue. If we look at the left plane in the figure above and compare the two vectors in terms of the original \(x\)-\(y\) plane (also rotated), it seems like the new vector has the same coordinates as the original vector and the original vector has changed coordinates. But if we look at those vectors in terms of the \(x\)-\(y\) original plane as in figure below:

vector rotation
Fig. 3. Vector rotation in terms of 2 different x-y planes.

If you take look closely at the two planes, the two vectors did not move by a single inch. However, depending on which plane we look at, these vectors have different coordinates with respect to that plane. This is an important concept related to vector spaces we did not touch in the last post, Transformation of Vector Spaces. For simplicity, just note that \(\boldsymbol{V}^T\) transforms a vector such that the resulting vector is equivalent (same as) to the original, but in another vector space.

In conclusion, the first step of the transformation by \(\boldsymbol{M}\) on \(\boldsymbol{v}\) with respect to SVD is a rotation or relection of \(\boldsymbol{v}\). Now, let’s assume that the result of this operation is \(\boldsymbol{w}\). The next step is a transformation by the diagonal matrix \(\boldsymbol{\Sigma}\).

\(\boldsymbol{\Sigma}\boldsymbol{w}\)

The second step is to multiply the vector obtained from the last step with the matrix \(\boldsymbol{\Sigma}\). This matrix is diagonal. Therefore, when multiplied with a vector, this matrix scales each component of the vector with respect to the corresponding values along the main diagonal of the diagonal matrix (you can try it on paper to get conviced 🙂). Another constraint imposed on this matrix is that all of its diagonal entries must be positive and this will dictate the way this matrix scales a vector.

For the case of SVD, this can be bit tricky. Why? when decomposing \(\boldsymbol{M}\), \(\boldsymbol{\Sigma}\) can have three form.

  • Square and Symmetric \(\rightarrow\) \(\boldsymbol{\Sigma} \in \mathbb{R}^{m\times n}\) where \(m = n\)
  • Rectangular with more rows \(\rightarrow\) \(\boldsymbol{\Sigma} \in \mathbb{R}^{m\times n}\) where \(m > n\)
  • Rectangular with more columns \(\rightarrow\) \(\boldsymbol{\Sigma} \in \mathbb{R}^{m\times n}\) where \(m < n\)

Let’s look at each case one by one.

Square and Symmetric

When \(\boldsymbol{\Sigma}\) is square and symmetric, we scale each components \(\boldsymbol{w}\) by the corresponding values of the diagonal. An example can be seen in the figure below.

vector scaling
Fig. 4. Vector scaling.

Recall that all the diagonal entries of \(\boldsymbol{\Sigma}\) are constrained to positive values. Therefore, this means that this operation preserves the sign of each components of the vectors. In a 2-D space, this equates to preserving the quadrant of the orignal matrix. So, if we take the example of a vector space \(\mathcal{F}\) that consists of vectors forming one-fourth of a circle with radius \(2\) and apply the current operations to each vector in this vector space, then, we get a strechted out circle as you can see in the figure below.

vector space stretched
Fig. 5. Vector space strechted by the square and symmetric matrix described earlier.

Rectangular with more rows

This case arises when the original matrix that was decomposed has also more rows than columns. In other words, the transformation done by this matrix projects the original vector into a vector space that has more dimensions than the original vector space in which the original vector sits.

In this case, when we recieve the resulting vector from the previous operations, we expand this vector such that it is scaled along each dimensions corresponding to the positive diagonal entries of \(\boldsymbol{\Sigma}\). However this case does something peciluar that differentiates it from the previous case. Since we have more rows that columns, the projected space has more dimensions than the original space but here is the catch. Our projection matrix is a diagonal matrix and any other entries other than the diagonal entries are constrained to \(0\) values. This means that if our projection matrix is a \(\mathbb{R}^{6 \times 3}\) matrix, it will be in the form,

$$ \begin{bmatrix} a & 0 & 0 \\ 0 & b & 0 \\ 0 & 0 & c \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} $$

This means that the resulting vector is a \(6\)-\(d\) vector (with six components) such that the \(3^{rd}\) to the last component is \(0\). So, the catch is … in this operation, we stretch our vector to more dimensions but the magnitude along these each of these new dimensions obtained is \(0\).

Consider the following example where we have a \(\mathbb{R}^{3 \times 2}\) diagonal matrix with non-negative entries. From our previous discussion on this, we know that we are strechting the a \(2\)-\(d\) vector to a \(3\)-\(d\) vector but the magnitude along this third dimension is \(0\). The result of such operation can be seen in the figure below.

TODO: figure of this.

Rectangular with more columns

To be continued…