Reordering Matrix Products II
Given two matrices \(A \in \mathcal{R}^{M\times S}\) and \(B \in \mathcal{R}^{S \times N}\) it is helpful to be able to reorder their product \(A B \in \mathcal{R}^{M\times N}\) in order to compute Jacobians for optimization. For this it is common to define the \(\mbox{vec}(.)\) operator which flattens a matrix into a vector by concatenating columns:
Using this operator, the (flattened) product \(AB\) can be written as one of:
where \(\otimes\) is the Kronecker Product and \(I_A\) is an identity matrix with the same number of rows as \(A\) and \(I_B\) is an idenity matrix with the same number of columns as \(B\).
From the two options for expressing the product, it's clear that \((B^T \otimes I_A)\) is the (vectorized) Jacobian of \(AB\) with respect to \(A\) and \((I_B \otimes A)\) is the corresponding vectorized Jacobian with respect to \(B\).
That's great but if our linear algebra library happens to represent matrices in row-major order it will be necessary to sprinkle a large number of transposes in to get things in the right order to apply these formulas. That will be tedious, error-prone and inefficient. Fortunately, equivalent formulas are available for row-major ordering by defining a \(\mbox{ver}(.)\) operator that flattens a matrix by concatenating rows to form a column vector:
Given this it is fairly quick to work out that the equivalent to the first two formulas using the \(\ver{.}\) operator are:
Using the same logic it can be concluded that \((I_A \otimes B^T)\) is the vectorized Jacobian of \(AB\) with respect to \(A\) and \((A \otimes I_B)\) is the corresponding Jacobian with respect to \(B\).
It's also worth noting that these are simplifications of a more general case for the product \(AXB\) that brings \(X\) to the right of the expression. The corresponding \(\vec{.}\) and \(\ver{.}\) versions are:
which makes it clear where the identity matrices \(I_A\) and \(I_B\) in the previous expressions originate.
All these can be demonstrated by the following python code sample:
from typing import Tuple
import numpy as np
def vec( M: np.ndarray ):
'''Flatten argument by columns'''
return M.flatten('F')
def ivec( v: np.ndarray, shape: Tuple[int,int] ):
'''Inverse of vec'''
return np.reshape( v, shape, 'F' )
def ver( M: np.ndarray ):
'''Flatten argument by rows'''
return M.flatten('C')
def iver( v: np.ndarray, shape: Tuple[int,int] ):
'''Inverse of ver'''
return np.reshape( v, shape, 'C' )
def test_vec_ivec():
A = np.random.standard_normal((3,4))
assert np.allclose( A, ivec( vec( A ), A.shape ) )
def test_ver_iver():
A = np.random.standard_normal((3,4))
assert np.allclose( A, iver( ver(A), A.shape ) )
def test_products():
A = np.random.standard_normal((3,4))
X = np.random.standard_normal((4,4))
B = np.random.standard_normal((4,5))
AB1 = A@B
Ia = np.eye( A.shape[0] )
Ib = np.eye( B.shape[1] )
AB2a = ivec( np.kron( B.T, Ia )@vec(A), (A.shape[0],B.shape[1]) )
AB2b = ivec( np.kron( Ib, A )@vec(B), (A.shape[0],B.shape[1]) )
assert np.allclose( AB1, AB2a )
assert np.allclose( AB1, AB2b )
AB3a = iver( np.kron( A, Ib )@ver(B), (A.shape[0], B.shape[1]) )
AB3b = iver( np.kron( Ia, B.T )@ver(A), (A.shape[0], B.shape[1]) )
assert np.allclose( AB1, AB3a )
assert np.allclose( AB1, AB3b )
AXB1 = A@X@B
AXB2 = ivec( np.kron(B.T,A)@vec(X), (A.shape[0],B.shape[1]) )
AXB3 = iver( np.kron(A,B.T)@ver(X), (A.shape[0],B.shape[1]) )
assert np.allclose( AXB1, AXB2 )
assert np.allclose( AXB1, AXB3 )
if __name__ == '__main__':
test_vec_ivec()
test_ver_iver()
test_products()
Hopefully this is helpful to someone.