Reordering Matrix Products
Here are a few operations for dealing with objective functions of matrix-valued variables:
Starting Definitions
Let \(A\) be a \(N\times M\) matrix with entry \(A_{ij}\) being the entry at row \(i\) and column \(j\). The \(i\)'th row of \(A\) is then \(A_{i*}\) and the \(j\)'th column is \(A_{*j}\).
Row and Column Vectorization
Define the column vectorization operator which returns a single column vector containing the columns of \(A\) stacked
likewise the row vectorization operator returns a single column vector containing the (transposes of) rows of \(A\) stacked:
In numpy, matrices are stored in row-major order, so the Python definition of these operaions is inverted as below:
def cvec( A ):
"""Stack columns of A into single column vector"""
return rvec(A.T)
def rvec( A ):
"""Stack rows of A into a single column vector"""
return A.ravel()
Similarly, the inverses of \(\cvec(A)\) and \(\rvec(A)\) unpack the vectorized values back into a matrix of the original shape:
with corresponding python definitions:
def rmat( v, nr ):
"""Reshape vector v into matrix with nr rows"""
return v.reshape((nr,-1))
def cmat( v, nc ):
"""Reshape vector v into matrix with nc columns"""
return v.reshape((cols,-1)).T
Finally, define two additional operators. The first of these is the spread operator, which takes the Kronecker product between an identity matrix and the input matrix, resulting in copies of the input matrix along the diagonal:
The second of these is the split operator which reverses the order of the arguments to the Kronecker product, basically replacing each entry \(A_{ij}\) with \(A_{i,j} I_{r\times r}\):
Python implementation of these are one-liners that just call the existing numpy Kronecker product:
def spread( M, n ):
"""Make n copies of M along the matrix diagonal"""
return np.kron( np.eye(n), M )
def split( M, n ):
"""Replace each entry of M with a n-by-n identity matrix scaled by the entry value"""
return np.kron( M, np.eye(n) )
With these, it is possible to define matrix products. There are also some fun properties that come from the Kronecker product:
Matrix Products
Let \(X\) be a \(N\times K\) matrix and \(Y\) be a \(K\times M\) matrix, then their product \(W = XY\) will be a \(N\times M\) matrix and the following are true:
What these let you do is unbury any (single) term that's in the middle of some horrendous expression. For example, say there's an expression \(W = AXB\) with \(A \in \mathcal{R}^{N\times R}\), \(X \in \mathcal{R}^{R\times S}\) and \(B \in \mathcal{R}^{S \times M}\) and you want to isolate \(X\) as a column vector. Then you have:
The step in the third line is the Kronecker product identity:
The same steps can also be done if you want row-major storage of \(X\):
In these cases the dimensions work out correctly but you don't need to worry too much about them since the dimensions of \(X\) are constrained by the column could in \(A\) and the row count in \(B\). This only applies when operating with them symbolically of course.
Frobenius Norm Problems
With this information, it becomes pretty easy to deal with Frobenius norm problems like:
I have found these frustrating in the past since I invariably end up in indexing hell. But it becomes much easier with the identities above. Applying them first of all isolates the X term to the right-hand-side of the expression and also converts the Frobenius norm to an \(L_2\) norm. Making the substitutions \(\tilde{x} = \cvec(X)\) and \(\tilde{y} = \cvec(y)\),
This now looks like a standard least-squares problem, albeit one with this Kronecker product mashed into it. Setting the gradient to zero gives the solution as:
This result has a similar form to \(AXB = \left( B^T \otimes A\right)\cvec(X)\), which can be used to get back to the matrix form of the expression:
It's also possible to come at this from a different direction
The gradient of this is given in the Matrix Cookbook.
Setting the gradient of this to zero and isolating \(X\) gives the following:
which matches the previous result.