James Gregson's Website

Jun 09, 2022

Reordering Matrix Products II

Given two matrices \(A \in \mathcal{R}^{M\times S}\) and \(B \in \mathcal{R}^{S \times N}\) it is helpful to be able to reorder their product \(A B \in \mathcal{R}^{M\times N}\) in order to compute Jacobians for optimization. For this it is common to define the \(\mbox{vec}(.)\) operator which flattens a matrix into a vector by concatenating columns:

\begin{equation*} \newcommand{\vec}[1]{\mbox{vec}{\left(#1\right)}} \vec{A} = \begin{bmatrix} A_{11} \\ \vdots \\ A_{M1} \\ \vdots \\ A_{1S} \\ \vdots \\ A_{MS}\end{bmatrix} \end{equation*}

Using this operator, the (flattened) product \(AB\) can be written as one of:

\begin{align*} \vec{AB} &=& (B^T \otimes I_A)\vec{A} \\ &=& (I_B \otimes A)\vec{B} \end{align*}

where \(\otimes\) is the Kronecker Product and \(I_A\) is an identity matrix with the same number of rows as \(A\) and \(I_B\) is an idenity matrix with the same number of columns as \(B\).

From the two options for expressing the product, it's clear that \((B^T \otimes I_A)\) is the (vectorized) Jacobian of \(AB\) with respect to \(A\) and \((I_B \otimes A)\) is the corresponding vectorized Jacobian with respect to \(B\).

That's great but if our linear algebra library happens to represent matrices in row-major order it will be necessary to sprinkle a large number of transposes in to get things in the right order to apply these formulas. That will be tedious, error-prone and inefficient. Fortunately, equivalent formulas are available for row-major ordering by defining a \(\mbox{ver}(.)\) operator that flattens a matrix by concatenating rows to form a column vector:

\begin{equation*} \newcommand{\ver}[1]{\mbox{ver}{(#1)}} \ver{A} = \begin{bmatrix} A_{11} \\ \vdots \\ A_{1S} \\ \vdots \\ A_{M1} \\ \vdots \\ A_{MS} \end{bmatrix} \end{equation*}

Given this it is fairly quick to work out that the equivalent to the first two formulas using the \(\ver{.}\) operator are:

\begin{align*} \ver{AB} &=& (I_A \otimes B^T)\ver{A} \\ &=& ( A \otimes I_B)\ver{B} \end{align*}

Using the same logic it can be concluded that \((I_A \otimes B^T)\) is the vectorized Jacobian of \(AB\) with respect to \(A\) and \((A \otimes I_B)\) is the corresponding Jacobian with respect to \(B\).

It's also worth noting that these are simplifications of a more general case for the product \(AXB\) that brings \(X\) to the right of the expression. The corresponding \(\vec{.}\) and \(\ver{.}\) versions are:

\begin{align*} \vec{AXB} &=& (B^T \otimes A) \vec{X} \\ \ver{AXB} &=& (A \otimes B^T) \ver{X} \end{align*}

which makes it clear where the identity matrices \(I_A\) and \(I_B\) in the previous expressions originate.

All these can be demonstrated by the following python code sample:

from typing import Tuple
import numpy as np

def vec( M: np.ndarray ):
    '''Flatten argument by columns'''
    return M.flatten('F')

def ivec( v: np.ndarray, shape: Tuple[int,int] ):
    '''Inverse of vec'''
    return np.reshape( v, shape, 'F' )

def ver( M: np.ndarray ):
    '''Flatten argument by rows'''
    return M.flatten('C')

def iver( v: np.ndarray, shape: Tuple[int,int] ):
    '''Inverse of ver'''
    return np.reshape( v, shape, 'C' )

def test_vec_ivec():
    A = np.random.standard_normal((3,4))
    assert np.allclose( A, ivec( vec( A ), A.shape ) )

def test_ver_iver():
    A = np.random.standard_normal((3,4))
    assert np.allclose( A, iver( ver(A), A.shape ) )

def test_products():
    A = np.random.standard_normal((3,4))
    X = np.random.standard_normal((4,4))
    B = np.random.standard_normal((4,5))

    AB1 = A@B

    Ia = np.eye( A.shape[0] )
    Ib = np.eye( B.shape[1] )

    AB2a = ivec( np.kron( B.T, Ia )@vec(A), (A.shape[0],B.shape[1]) )
    AB2b = ivec( np.kron(  Ib,  A )@vec(B), (A.shape[0],B.shape[1]) )

    assert np.allclose( AB1, AB2a )
    assert np.allclose( AB1, AB2b )

    AB3a = iver( np.kron( A,   Ib )@ver(B), (A.shape[0], B.shape[1]) )
    AB3b = iver( np.kron( Ia, B.T )@ver(A), (A.shape[0], B.shape[1]) )

    assert np.allclose( AB1, AB3a )
    assert np.allclose( AB1, AB3b )

    AXB1 = A@X@B
    AXB2 = ivec( np.kron(B.T,A)@vec(X), (A.shape[0],B.shape[1]) )
    AXB3 = iver( np.kron(A,B.T)@ver(X), (A.shape[0],B.shape[1]) )

    assert np.allclose( AXB1, AXB2 )
    assert np.allclose( AXB1, AXB3 )

if __name__ == '__main__':
    test_vec_ivec()
    test_ver_iver()
    test_products()

Hopefully this is helpful to someone.